THE  ACQUISITION  OF  SOME  AMERICAN  ENGLISH  DURATION 
PARAMETERS  BY  NONNATIVE  SPEAKERS  OF  ENGLISH 


By 

ANNA  MARIE  SCHMIDT 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN 
PARTIAL  FULFILLMENT  OF  THE  REQUIREMENTS 
FOR  THE  DEGREE  OF  DOCTOR  OF  PHILOSOPHY 


UNIVERSITY  OF  FLORIDA 


1988 


ACKNOWLEDGMENTS 

This  study  could  not  have  been  completed  without  the  very  generous 
help  and  support  provided  by  many  individuals  and  groups.  I  am  grate- 
ful to  those  of  my  teachers  at  the  University  of  Florida  who  gave  me 
the  skills  to  perform  this  study,  accepted  my  interdisciplinary  inter- 
ests, and  set  the  standards  I  hope  to  achieve.  I  am  especially  grate- 
ful to  the  members  of  my  dissertation  committee:  Dr.  Howard  Rothman, 
Dr.  Jean  Casagrande,  and  Dr.  W.S.  Brown,  Jr. 

Dr.  James  E.  Flege,  whose  course  I  attended  at  the  Summer  Insti- 
tute in  Linguistics,  was  responsible  for  showing  me  how  to  apply  those 
skills  in  an  area  that  has  fascinated  me. 

My  gratitude  is  also  deep  to  my  students  who  kindly  volunteered  to 
be  subjects  and  who  found  me  other  subjects.   The  Royal  Thai  Embassy 
in  Washington,  D.C.  generously  allowed  me  to  interview  and  tape  in  the 
Student  Section.  The  Korean  YMCA  in  Arlington,  Virginia,  volunteered 
its  staff.  The  Korean  First  Baptist  Church  of  Silver  Spring,  Mary- 
land, also  found  subjects  for  me.   Dr.  A.  Sydney  Wilson  performed 
heroic  service  which  included  development  of  contacts  and  taping 
arrangements . 

The  research  facilities  for  this  study  were  provided  in  part  by 
Dr.  Craig  Linebaugh  and  the  Department  of  Speech  and  Hearing  of  The 
George  Washington  University.   Other  equipment  was  provided  in  part  by 
Mrs.  Darlyn  Wolvin  and  the  Department  of  Speech  of  Prince  George's 


-11- 


Community  College.  My  colleagues  in  these  departments  employed  and 
encouraged  me  through  the  long  endeavor. 

I  am  grateful  to  Dr.  Carole  Reisen  for  advice  and  assistance  with 
the  statistical  analysis.  Dr.  James  Hillis  also  provided  invaluable 
statistical  advice.   Artie  Wilson  and  Dr.  Susan  Thomas  helped  with 
editing  and  lively  discussion. 

Finally,  I  would  like  to  thank  my  family,  and  friends,  especially 
Mr.  and  Mrs.  Felix  Schmidt  Jr.,  Mr.  Felix  Schmidt,  and  the  Tringas 
family,  for  their  support  during  my  graduate  education. 


-111- 


TABLE  OF  CONTENTS 

Page 

ACKNOWLEDGMENTS  ii 

ABSTRACT   vi 

CHAPTERS 

1  INTRODUCTION   1 

General  Background   1 

Models  of  Second  Language  Pronunciation  Acquisition   .  .  6 

Phonetic  Models   8 

Phonological  Models   9 

Summary 12 

Phonetic  Interference  Model   13 

Phonetic  Interference   15 

Other  Evidence  for  A  Phonetic  Interference  Model    .  16 

Summary 21 

A  Prediction  of  the  Phonetic  Interference  Model   ....  23 

Limits  on  Accuracy   24 

New  Versus  Similar  Sounds  26 

Summary 28 

The  Hypotheses  of  This  Research 29 

2  DURATION  FACTORS   32 

Introduction   32 

Duration  of  Segments   33 

Duration  Parameters  in  English  and  Other  Languages   .  .  35 

Voice  Onset  Time  (VOT)    35 

Vowel  Duration   50 

Final  Consonant  Closure   56 

Vowel/Consonant  Ratio   61 

Summary 65 

3  METHODS 66 

Introduction   66 

Subjects   66 

Nonnative  Subject  Selection   67 

The  Age  Factor  in  Nonnative  Subject  Selection   ...  70 

Judgements  of  Accent  Strength   71 

Results  of  Rating  and  Discussion   73 


-IV- 


Experimental  Procedures  76 

Test  Materials   76 

Recording  Procedure   77 

Instrumentation   78 

Acoustic  Analysis  79 

Criteria  for  Parameter  Measurement   80 

Analysis  of  Raw  Data   81 

4  RESULTS 84 

Introduction   84 

Overview  of  Results 86 

Initial  Stop  Voicing   87 

VOT  for  Language  Groups 87 

VOT  for  Superior  Speakers 89 

Individual  VOT   92 

Final  Consonant  Closure  95 

Consonant  Closure  for  Language  Groups   95 

Consonant  Closure  for  Superior  Speakers   95 

Individual  Consonant  Closure   100 

Vowel  Duration   104 

Vowel  Duration  for  Language  Groups   104 

Vowel  Duration  for  Superior  Speakers    104 

Individual  Vowel  Duration   107 

Vowel/Consonant  Ratio   109 

Vowel/Consonant  Ratio  for  Language  Groups   109 

Vowel/Consonant  Ratio  for  Superior  Speakers   ....  109 

Vowel/Consonant  Ratio  for  Individuals  Ill 

5  REVIEW  AND  CONCLUSIONS   113 

Introduction   113 

Discussion 115 

VOT U9 

Final  Consonant  Closure  130 

Vowel  Duration   140 

Vowel/Consonant  Ratio   145 

Conclusions 149 

APPENDICES 

A  INSTRUCTIONS  AND  RATING  SCALE   152 

Instructions   152 

Sample  Rating  Sheet   153 

B  INSTRUCTIONS  FOR  EXPERIMENTAL  PROCEDURE   154 

C  LANGUAGE  BACKGROUND  QUESTIONNAIRE   155 

REFERENCES   156 

BIOGRAPHICAL  SKETCH   161 


-v- 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

THE  ACQUISITION  OF  SOME  AMERICAN  ENGLISH  DURATION 
PARAMETERS  BY  NONNATIVE  SPEAKERS  OF  ENGLISH 

by- 
Anna  Marie  Schmidt 
April,  1988 

Chairman:  Dr.  Howard  B.  Rothman 
Speech  Department 

This  study  investigated  the  claim  that  even  superior  bilinguals 
will  never  achieve  authenticity  in  the  phonetic  specifications  of 
their  second  language.  Production  of  VOT  for  initial  bilabial  stop 
consonants,  consonant  closure  duration  for  final  alveolar  stop  conso- 
nants, vowel  duration  for  /I,  i/,  and  relative  vowel  duration  before 
voiced  and  voiceless  consonants  were  examined  for  American  English 
speakers  and  for  three  groups  of  accented  to  superior  bilinguals.   The 
bilinguals  included  native  speakers  of  Arabic,  Korean,  Spanish,  and 

Thai. 

The  nonnative  speakers  were  rated  on  strength  of  accent  and  the 
best  speakers  were  compared  statistically  with  the  native  English 
speakers.  The  superior  speakers  were  not  different  from  the  English 
speakers  on  any  of  the  measures  except  for  vowel  duration  relative  to 
consonant  voicing.   However,  individual  speakers  seemed  to  vary  widely 
within  the  superior  group  and  were  more  similar  to  other  speakers  of 
their  native  language  than  to  each  other  on  most  of  the  measures. 


-vx- 


Comparison  of  VOT  categories  indicated  that  superior  bilinguals 
from  all  languages  were  re-labeling  categories  to  achieve  authenticity 
in  English.   The  superior  speakers  had  also  acquired  a  variety  of 
timing  options  for  final  consonant  closure. 


-vii- 


CHAPTER  1 
INTRODUCTION 

General  Background 


Leather  (1983) ,  in  a  state-of-the-art  article  entitled  "Second 
language  pronunciation  learning  and  teaching,"  states  that  "pronuncia- 
tion is  no  longer  to  be  taken  as  an  indispensable  component  of 
second — or  foreign — language  .  .  .  teaching  programmes"  (p.  198).  He 
sees  two  reasons  for  this.   First,  "communicative  function"  is 
currently  a  higher  priority  than  near-native  accent.  Second, 
"...  the  phonetic  and  psycholinguistic  foundations  for  training  no 
longer  look  as  secure  as  they  did  two  or  three  decades  ago  .  .  .  too 
little  is  known  about  pronunciation  learning  to  advocate  dogmatically 
any  particular  design  for  pronunciation  teaching"  (p.  198,  emphasis  in 
the  original).  This  statement  seems  to  indicate  that  the  traditional 
models  of  second  language  pronunciation  acquisition  do  not  work  in 
application. 

The  situation  described  by  Leather  might  have  arisen,  at  least  in 
part,  out  of  the  academic  separation  of  scholars  most  concerned  with 
pronunciation  learning  and  teaching.   There  currently  exist  three 
general  disciplines  that  deal  to  a  greater  or  lesser  extent  with  second 
language  pronunciation.   These  fields  are  phonetics,  phonology,  and 
second  language  teaching.  The  bases  and  aims  of  the  three  fields 


-1- 


-2- 

differ;  terminologies  are  often  mutually  unintelligible  and  methodolo- 
gies are  poorly  understood  between  fields. 

Because  of  this,  it  is  difficult  to  determine  the  relevance  of 
findings  in  one  field  for  another  field.   There  is  also  a  problem  in 
locating  and  interpreting  relevant  findings  among  masses  of  information 
or  theoretical  argument.  Researchers  in  second  language  pronunciation 
operate  within  a  new  field,  which,  as  will  be  shown  here,  intersects 
phonetics  and  phonology  in  unaccustomed  ways. 

Second  language  research,  whether  focused  on  grammatical,  phonolog- 
ical, phonetic,  or  semantic  acquisition  of  the  new  language,  has  only 
recently  (and  not  in  all  quarters)  become  what  could  be  considered  a 
"respectable"  academic  pursuit  in  the  United  States.  This  may  be  due, 
at  least  in  part,  to  several  factors  related  to  what  may  be  described 
as  the  socio-culture  view  that  Americans  hold  toward  foreignness. 
First,  foreigners  are  generally  associated  with  lower  social  class; 
i.e.,  poor  immigrants  and  illegals,  or  are  associated  with  a  type  of 
super-rich  social  class.   Second,  those  persons  traditionally  associ- 
ated with  foreigners  in  academic  circles  have  been  teachers  and  others 
in  applied  fields.  Educators  of  children  and  foreigners  are  generally 
of  low  social  status  in  this  country.   Admit  it  or  not,  it  is  difficult 
for  the  researcher  to  overcome  cultural  xenophobia  and  its  assumption 
of  superiority. 

The  last  factor  which  seems  to  have  operated  to  reduce  or,  at  any 
rate,  not  encourage  researchers  to  study  second  language  acquisition  is 
the  object  of  study  in  second  language  research.   The  systems,  rules, 
or  patterns  displayed  by  the  second  language  learner  are  imperfect  in 
that  they  match  neither  the  first  language  nor  the  second  language. 


-3- 
The  cliche  used  in  general  speech  is  "Broken  English."  This  phrase 
implies  something  to  be  corrected  or  fixed  in  some  way  rather  than 
something  worthy  of  study.   Happily,  the  factors  mentioned  above  are 
now  being  overcome  although  the  basic  tenets  of  the  field  of  second 
language  research  are  still  to  be  clarified. 

Today,  writers  like  Leather  (1983),  quoted  above,  find  that  the 
older,  more  established  fields  of  phonetics  and  phonology  have  little 
to  offer  in  terms  of  a  theoretical  foundation  for  pronunciation 
teaching.   The  problem  may  be  stated  simply  as  follows:  Is  second 
language  pronunciation  research  concerned  with  phonetics  or  phonology? 
Confusion  in  this  regard  seems  likely  to  have  hampered  advances  in 
second  language  pronunciation  research. 

What  is  meant  by  the  question,  "Is  second  language  pronunciation 
research  concerned  with  phonetics  or  phonology?"  and  why  is  the 
question  important?  Phonetics  is  generally  accepted  to  be  concerned 
with  the  articulatory  and  acoustic  realities  of  human  speech. 
Published  phonetic  studies  are  detailed  in  a  way  phonologists  often 
find  irrelevant.   Phonetics,  as  a  field  of  study,  is  language  indepen- 
dent in  the  sense  that  the  transcription  methods  aim  at  universal 
applicability,  and  in  that  it  attempts  to  apply  principles  of  produc- 
tion or  perception  across  languages.   An  example  of  a  phonetic  issue  is 
categorical  perception  of  stop  consonants. 

The  study  of  phonology,  on  the  other  hand,  is  language  dependent. 
Classical  phonology  concerned  itself  with  the  sound  patterns  of  a 
particular  language,  while  more  recent  phonologies  attempt,  in 
addition,  to  derive  specific  patterns  from  more  general  (language 
independent)  principles.   Although  modern  phonology  aims  at  universal 


-4- 
principles,  its  rules  attempt  to  derive  the  pronunciation  of  one 
particular  language  at  a  time. 

Looked  at  from  the  points  of  view  of  phonetics  and  phonology,  where 
do  the  productions  of  a  speaker  learning  a  new  language  fit?  The 
speaker  produces  neither  the  first  language  nor  the  second  language 
sound  pattern  exclusively.   Phoneticians  may  analyze  the  productions 
from  the  point  of  view  of  acoustics  and  articulation,  but  of  what  use 
is  that  to  the  student  or  educator  without  a  theory  of  acquisition? 
Phonologists  may  attempt  to  systematize  and  abstract  but  with  what 
exactly  are  they  working?   Is  it  a  language  as  some  have  argued  (an 
"Interlanguage") ,  or  is  it  a  process  similar  to  but  not  exactly  like 
what  a  child  experiences?  Are  the  assumptions  about  what  is  important 
or  relevant  made  by  both  the  phonetician  and  the  phonologist  correct 
for  this  material? 

In  applied  areas,  theoretical  distinctions  between  phonetics  and 
phonology  are  most  often  overlooked,  perhaps  because  of  reasons  offered 
at  the  beginning  of  this  chapter.   Examples  of  this  may  be  seen  in 
current  teaching  texts,  which  may  include  whatever  material  is  deemed 
teachable  and  useful.   A  good  example  is  a  text  entitled  General 
American  Speech  for  the  Bilingual  Spanish  Speaking  Student  (White, 
1979).   This  text  contrasts  the  articulation  of  phonemes  and  includes 
aspiration  or  timing  of  voiceless  plosives.  Most  interestingly,  White 
(1979)  states  that  the  fundamental  consideration  in  learning  to 
pronounce  English  is  retraining  of  the  use  of  breathing  patterns  (based 
on  the  differences  in  stress  and  syllabic  intonation  between  the  two 
languages) . 


-5- 

Clear  Speech  (Gilbert,  1984),  a  text  by  a  linguist,  focuses  very 
little  attention  on  segment  sized  (phonological)  units.   Rather, 
implementation  of  stress  and  intonation  in  syllables  forms  the  major 
thrust  of  the  text.  The  strategy  here  is  to  explain  voicing  as  a 
feature  and  its  effect  on  vowel  duration  rather  than  to  contrast  sets 
of  specific  sounds.   It  might  be  thought  that  phonetic  detail  plays 
little  part  in  this  workbook.   However,  the  inclusion  of  a  set  of  tapes 
with  the  text  provides  information  not  specified  in  the  writing.  This 
would  include,  for  example,  relative  vowel  duration.   Vowel  length  is 
discussed  in  the  text  partially  under  the  labels  of  "clear  and  unclear" 
vowels.  Clear  vowels  are  said  to  be  "full  (long),  can  be  stressed" 
while  unclear  vowels  are  "reduced,  (short),  cannot  be  stressed"  (p. 
19).   Specification  of  relative  vowel  length  is  provided  only  on  the 
taped  examples.   If  the  assumption  were  that  length  would  be  determined 
from  some  universal  phonetics,  no  taped  material  would  be  necessary. 

In  this  study,  one  recent  model  that  attempts  to  provide  a 
foundation  for  theoretical  and  applied  second  language  pronunciation 
acquisition  work  will  be  considered.   The  model  addresses  the  question: 
Is  second  language  pronunciation  research  concerned  with  phonetics  or 
phonology?  Briefly,  the  answer  given  by  the  proposed  model  says  that 
it  is  phonology  but  a  more  complete  phonology  than  has  been  envisioned 
previously.   This  study  will  examine  some  claims  made  by  a  model  of 
sound  acquisition  developed  recently  by  J.  Flege  (1979).   This  model 
attempts  to  combine  the  best  of  the  older  models  of  sound  acquisition 
while  avoiding  defects  that  Flege  sees  as  inherent  in  the  older  models. 
Nonnative  speakers  of  English,  in  acquiring  English,  may  make  changes 
in  the  production  and  timing  of  the  articulatory  gestures  used  in 


-6- 
speech.  The  inventory  of  articulatory  gestures  and  implementation  of 
that  inventory  in  terms  of  such  factors  as  timing  and  placement  seems 
to  determine,  at  least  in  part,  whether  or  not  the  speakers  are 
perceived  by  native  speakers  as  having  a  foreign  accent.  One  aspect  of 
acquisition  will  be  examined  here:  Flege's  Phonetic  Interference  Model 
predicts  that  nonnative  speakers  cannot  achieve  complete  success  in 
pronunciation  of  a  second  language  especially  when  sounds  found  in  the 
new  language  (L2)  are  similar  to  sounds  found  in  the  first  language 
(LI).  The  present  descriptive  phonetic  study  investigates  the  extent 
of  acquisition  of  some  timing  variables  in  English  by  speakers  of  four 
non-English  languages  who  are  perceived  to  have  little  or  no  foreign 
accent  in  order  to  test  Flege's  claim.   A  comprehensive  search  of  the 
literature  has  found  no  studies  that  have  looked  at  more  than  one  group 
of  excellent  second  language  speakers  in  detail. 

Models  of  Second  Language  Pronunciation  Acquisition 

In  this  section,  models  of  second  language  pronunciation  acquisi- 
tion as  described  by  Flege  in  his  1979  dissertation  will  be  discussed. 
Also,  problems  found  by  Flege  (1979)  in  traditional  models  will  be 
examined . 

Models  or  theories  of  second  language  acquisition  implicitly  or 
explicitly  underlie  the  pedagogical  methods  traditionally  used.   Most 
teaching  strategies  have  taken  their  models  from  the  findings  of 
linguistics;  that  is,  the  description  of  the  sounds  or  sound  systems  of 
particular  languages  and,  often,  contrastive  analyses  of  sound  systems. 
The  question  posed  by  teachers  was,  "What  is  it  that  needs  to  be 


-7- 
learned?"  It  is  only  recently  that  advances  have  been  made  both  the 
understanding  of  the  implications  of  the  question  and  in  the  specifica- 
tion of  an  answer . 

In  a  1979  dissertation,  Flege  first  began  his  discussion  of  a  more 
adequate  model.   In  that  study  he  analyzed  the  foundations  of  teaching 
strategies  or  methods  to  determine  what  linguistic  models  of  pronuncia- 
tion acquisition  in  a  second  language  were  implied.  The  theory  or 
model  of  sound  acquisition  was  not  always  explicitly  stated  in  the 
teaching  strategy  or  model.  Flege  developed  his  model  (1979)  in 
response  to  the  shortcomings  of  traditional  models,  although  he  sees 
(1986)  that  the  current  position  is  merely  a  beginning.   The  following 
is  a  summation  of  Flege' s  analysis  and  the  development  of  his  model  as 
it  was  set  forth  in  his  1979  dissertation. 

Flege' s  central  claim  in  his  1979  dissertation  was  that  predictions 
concerning  a  learner's  pronunciation  needed  to  take  into  account  at 
least  some  previously  ignored  phonetic  differences  between  languages 
because  those  models  that  are  based  on  abstract  phonological  (or 
phonemic)  categories  are  not  always  complete  or  accurate.   Before 
examining  the  argument  Flege  developed  in  his  dissertation,  it  is 
necessary  to  look  briefly  at  the  kinds  of  models  he  had  in  mind.   Flege 
divided  the  models  underlying  teaching  strategies  into  two  types:  those 
by  phoneticians  and  those  applications  by  linguists  (usually  based  on 
structural  linguistics).   Both  general  models  were  found  by  Flege  to  be 
inadequate  in  accounting  for  second  language  sound  acquisition. 


-8- 
Phonetic  Models 

By  Phonetic  Models  Flege  (1979)  meant  applied  works  by  phoneti- 
cians. Flege  reviewed  older  works  such  as  those  by  Jones  and  Delattre. 
These  writers  stress  identification  of  articulatory  patterns  for  sound 
production  as  well  as  discrimination  training.   Drill  in  perception  and 
production  are  expected  to  produce  new  articulations  that  will 
generalize  in  language  use.  Flege  refers  to  this  method  as  the 
"corrective  phonetics  approach."  It  was  felt  that  learners  could  use 
their  knowledge  of  the  sounds  of  their  native  language  to  "correct"  the 
articulatory  position  for  sounds  in  the  target  language.   For  example, 
to  achieve  /I/  production  (which  is  difficult  for  most  non-English 
speaking  students),  the  usual  procedure  is  to  establish  the  articula- 
tion of  /i/  and  /e/  and  to  ask  the  student  to  position  the  tongue 
midway  between  the  two.   Some  reference  is  also  made  to  lip  configura- 
tion and,  very  generally,  to  tongue  tension. 

Flege  (1979)  finds  that  practioners  of  a  purely  phonetic  approach 
have  no  explicit  theory  of  acquisition.  The  speaker  merely  adds  new 
motor  placements  and  movements  to  an  existing  repertoire.   Learning, 
then,  proceeds  as  habits  are  established.   In  this  model,  sounds  need 
not  be  organized  or  categorized  except  for  the  convenience  of  the 
phonetician  or  the  teacher,  since  production  of  one  sound  does  not  seem 
to  depend  on  production  of  other  sounds. 

The  phonetic  approach  remains  a  strong  component  of  more  recent 
applied  works  on  pronunciation  (Buck  &  Alterbaum,  1983;  Schmidt  &  Kass, 
1986,  for  example),  though  it  is  usually  a  minor  aspect  of  the  teaching 
strategy.   Current  works  incorporate  articulatory  specification  of 


-9- 
target  sounds,  discriminated  by  drill  and  practiced  in  ever  more 
complex  contexts.   A  recent  work  in  this  tradition  states,  "The 
fundamental  method  by  which  a  student  learns  to  pronounce  English  is  by 
imitating  the  pronunciation  of  English-speaking  persons.  .  ."(p.  1) 
(Prator  &  Robinett,  1985).  The  model  applied  in  the  Prator  and 
Robinett  (1985)  text  seems  to  imply  learning  by  acquiring  new  motor 
habits.   The  Phonetic  Model,  implicit  or  explicit  (as  in  the  "imita- 
tion" method  of  Prator  and  Robinett  [1985])  seems  to  picture  the 
learner  developing  sets  of  language-specific  sounds  that  can  be 
accessed  for  use  in  speaking  a  particular  language.  For  a  speaker  with 
a  foreign  accent,  the  Phonetic  Model  would  explain  the  accent  by 
pointing  to  inaccurate  motor  movements.   Virtually  all  current 
applications  in  teaching  include  the  use  of  at  least  some  articulatory 
(sometimes  physiological)  material,  though  not  always  in  a  systematic 
fashion. 

Flege's  analysis  in  his  1979  dissertation  considered  only  older 
texts  by  phoneticians.   I  have  included  some  newer  works  to  show  that 
the  theme  of  correcting  articulatory  habits  remains  productive. 
Specification  of  phonetic  detail,  which  may  play  little  or  no  part  in 
phonological  systematization ,  has  been  found  to  be  necessary  in 
teaching  though  it  is  often  dispensed  with  in  models  of  second  language 
sound  acquisition. 

Phonological  Models 

By  Phonological  Models,  Flege  (1979)  meant  segment  based  systems. 
This  approach  asserts  that  phonetics  deals  with  the  surface  (or 


-10- 
motoric)  realization  of  an  abstract  underlying  phonology  and  that  the 
mental  organization  of  the  abstract  system  is  what  causes  problems  in 
learning.  "Interference"  also  becomes  a  factor  when  phonological 
systems  are  invoked. 

Interference,  to  most  second  language  researchers,  means  that  the 
previously  organized  system  of  phonological  contrasts  will  somehow 
influence  the  acquisition  of  a  new  system  of  contrasts.  This  is 
traditionally  taken  to  mean  that  words  in  the  target  language  will  be 
produced  using  segments  and  allophones  from  the  native  language. 
Problems  caused  by  interference  occur  when  the  segmental  inventories  do 
not  match.   Interference  would  not  be  a  problem  in  the  previously 
discussed  Phonetic  Models,  since  articulatory  patterns  are  not  usually 
systematically  related  to  each  other . 

In  his  1979  discussion,  Flege  dealt  only  with  the  tradition  of 
structural  linguistics  and  the  phoneme  by  linguists  such  as  Trubetzkoy, 
Weinreich,  and  Lado.   He  was  interested  in  theoretical  linguistic 
sources  as  they  were  applied  to  second  language  pronunciation  acquisi- 
tion rather  than  purely  theoretical  sources. 

Generally,  in  Phonological  Models,  linguistic  systematization 
posits  categories  of  phonemes  with  allophones  in  a  network  of  relation- 
ships that  defines  the  particular  language.   Contrastive  analysis  is 
supposed  to  explain  and  predict  difficulties  the  learner  will  experi- 
ence. 

Flege  (1979),  in  his  dissertation  study,  found  four  specific 
difficulties  with  contrastive  phonemic  models  of  pronunciation.   First, 
these  models  may  predict  that  learners  will  have  difficulty  with 
specific  sounds  because  of  structural  differences  between  the  first  and 


-11- 

second  language.  For  example,  it  is  true  that  native  English  speakers 
who  learn  languages  such  as  Vietnamese  have  difficulty  with  word 
initial  U/   because  of  a  rule  of  English  that  does  not  permit  the  sound 
in  word  initial  position,  though  the  sound  is  used  in  English  in 
syllable  final  position.  However,  contrastive  analysis  cannot  predict 
degree  of  difficulty  for  a  range  of  different  sounds  to  be  learned, 
even  though  some  sounds  do  seem  to  be  more  difficult  to  learn  than 
others  (Briere,  1966).  Within  the  Phonological  Model,  phonemes  or 
bundles  contrast  or  do  not  contrast.   There  is  no  room  for  "slightly 
different"  sounds. 

Second,  Flege  claims  that  there  is  asymmetry  of  error  between 
languages.  That  is,  a  difference  between  two  languages  should,  in 
contrastive  theory,  create  problems  for  speakers  of  both  languages. 
That  is  not  always  the  case.   Moulton  (1962)  discusses  the  contrastive 
difference  in  word  final  obstruent  voicing  between  German  and  English. 
It  seems  that  while  native  German  speakers  have  difficulty  maintaining 
voicing  in  this  position  when  speaking  English,  native  English  speakers 
have  no  difficulty  acquiring  the  rule  for  neutralization  of  voicing 
when  learning  German . 

Flege' s  third  difficulty  involves  the  inability  of  phonemic  models 
to  predict  which  sounds  in  the  target  language  will  be  identified  with 
which  particular  sounds  in  the  first  language.   Flege  uses  the  example 
of  a  common  American  substitution  of  [u]  for  French  [y]  when  the 
American  is  learning  French  and  finds  that  phonemic  theory  offers  us  no 
reasons  why  [i]  is  not  substituted  instead. 

The  last  difficulty  Flege  cited  in  his  1979  dissertation  is 
variability  of  error.   Development  of  a  sound  system  by  a  learner  may 


-12- 
take  place  (if  it  does  take  place)  through  continuous  approximation  of 
the  correct  sound  rather  than  through  abrupt  mastery  of  a  sound. 
Segmentally  based  models  cannot  predict  a  course  of  gradual  or 
individual  acquisition,  since  there  is  no  way  both  to  "have"  a  segment 
and  not  to  "have"  it  at  the  same  time. 

Summary 

Flege,  in  his  1979  dissertation,  examined  the  models  of  sound 
acquisition  developed  in  the  applied  works  of  classical  linguists  and 
phoneticians.   He  looked  at  models  based  on  phonetic  approaches  and 
models  based  on  phonemic/phonological  approaches.   Flege  claimed  that 
Phonetic  Models  assume  that  some  sounds  will  be  difficult  to  produce 
because  they  may  be  more  difficult  to  articulate  or  because  the  motor 
habits  used  in  producing  them  are  new.   Thus,  this  approach  cannot 
account  for  the  fact  that  Americans  find  it  difficult  to  produce  word 
initial  fy / ,  an  item  already  in  the  English  motor  habit  system. 

Contrastive  Phonemic  Models,  of  the  type  considered  by  Flege,  on 
the  other  hand,  are  usually  based  on  the  assumption  that  the  only  types 
of  interference  in  learning  caused  by  knowledge  of  the  first  language 
are  those  related  to  systematic  phonological  differences.   This  view 
cannot  explain  the  fact  that  though  the  Thai  system  of  voiceless 
initial  stop  consonants  contains  both  phonemic  aspirated  and  unaspi- 
rated  word  initial  /p/,  Thai  speakers  learning  English  will  fail  to 
consistently  use  the  correct  aspirated  [ph]  required  in  word  initial 
position  in  English  (Linananda,  1964). 

In  other  words,  traditional  Phonological  Models  predict  the  effect 
of  foreign  accent  based  on  contrasts  that  apply  within  the  language  as 


-13- 

sets,  but  may  not  apply  equally  well  across  languages.  Flege  does  not 

reject  contrastive  analysis  and  states  that  contrastive  analysis  can 

explain  some  problems  in  pronunciation  but  that 

A  reasonable  and  testable  hypothesis  is  that  much 
interference  in  second  language  learning  is  due  sim- 
ply to  the  maintenance  of  phonetic  rules  of  imple- 
mentation in  the  second  language.   Phonetic  inter- 
ference, unlike  the  interference  defined  by  phone- 
mic-interference models,  is  not  restricted  to  phono- 
logically  distinctive  features,  but  may  result  from 
any  sub-phonemic  or  phonemic  differences  between 
speech  sounds  in  the  native  and  target  languages 
(1979,  p.  45,  emphasis  in  the  original). 

Flege  (1979)  discussed  a  proposed  model  that  seems  to  account 
for  many  of  the  "exceptions"  to  the  types  of  predictions  or  rules  that 
are  possible  in  the  models  discussed  in  this  chapter.   Recently,  other 
researchers  have  proposed  similar  modifications  to  early  models. 
Flege' s  proposed  model  and  evidence  put  forward  by  other  researchers 
that  seems  to  support  Flege 's  model  (or  one  like  it)  will  be  reviewed 
in  the  following  sections  of  this  chapter.   The  present  study  proposes 
to  test  one  aspect  of  such  a  model  in  a  more  comprehensive  manner  than 
the  model  has  been  tested  previously. 

Phonetic  Interference  Model 

This  section  will  be  concerned  with  Flege' s  proposed  Phonetic 
Interference  Model.   Evidence  from  other  research  that  seems  to  support 
Flege 's  proposal  will  also  be  discussed. 

Flege  (1979)  found  problems  with  two  types  of  Phonetic  and 
Phonological  Models  of  second  language  sound  acquisition  as  they  had 
been  traditionally  developed.   Therefore,  he  proposed  that  models  of 


-14- 
acquisition  should  be  developed  that  take  into  account  what  seem  to  be 
surface  phonetic  as  well  as  abstract  phonological  explanations.   In 
order  to  provide  experimental  evidence  for  this  view,  he  investigated 
stop  productions  in  English  by  speakers  of  Arabic  to  determine  which 
differences — phonetic  or  phonological — caused  pronunciation  problems 
for  L2  learners.  The  phonological  contrast  investigated  was  the 
absence  of  /p/  in  the  phonological  system  of  Arabic.   The  phonetic 
contrast  investigated  was  the  different  implementation  of  the  feature 
[voice]  in  stops  between  Arabic  and  English. 

Flege  found  that  speakers  of  Saudi  Arabic  were  able  to  produce  an 
adequate  /p/  sound,  indicating  that  they  could  set  up  an  abstract 
system  of  stop  contrasts  similar  to  the  English  stop  system.   The  /p/ 
produced  by  the  L2  speakers  shared  in  the  feature  [-voice]  with  English 
/p/.  Therefore,  the  problem  was  not  phonological.   However,  in  a 
confusion  matrix,  native  English  speakers  often  confused  Saudi  /p/  with 
/b/  (22%  of  the  time  in  initial  position,  49%  of  the  time  in  final 
position).  This  supports  Flege' s  argument  that  a  phonological  expla- 
nation is  not  completely  adequate  to  explain  foreign  accent  because 
segmental  considerations  cannot  explain  the  confusion,  given  that  a 
voiceless  /p/  was  produced.  Flege  then  analyzed  the  Arabic  productions 
acoustically.  He  found  problems  in  implementation  of  stop  voicing  in 
terms  of  timing  of  glottal  pulsing,  relative  closure  duration  between 
/p-b/,  and  relative  vowel  duration  before  voiced  and  voiceless  stops 
(details  of  this  acoustic  study  will  be  described  later).   The  Saudis 
were  implementing  these  phonetic  durations  as  they  would  for  a 
hypothetical  Arabic  voiceless  stop  rather  than  as  is  done  in  English. 
Thus,  explaining  acquisition  of  English  by  Arabic  speakers  necessitates 


-15- 
use  use  of  what  would  generally  be  considered  subphonemic  or  phonetic 
details  of  implementation. 

Phonetic  Interference 

What  is  meant  by  "phonetic  interference"?  Flege  (1979)  in  the 
quotation  given  above  (p.  13)  proposes  that  the  maintenance  of  phonetic 
rules  of  implementation  for  LI  causes  problems  in  producing  accurate 
sounds  in  L2 .   How  do  we  clearly  differentiate  phonological  rules  from 
phonetic  rules  of  implementation? 

Port  and  others,  in  a  series  of  articles  (Port,  Al-Ani,  &  Maida, 
1980;  Port  &  Mitleb,  1983;  Mitleb,  1981;  1984a;  1984b;  Flege,  1979; 
1981;  1986)  have  argued  that  segment  based  phonetics  and  phonology 
(whether  "phonemes"  or  "bundles  of  features"  are  basic  units  being 
discussed)  cannot  deal  with  certain  subsegmental  or  suprasegmental 
language  specific  phenomena,  which  include  what  are  generally  called 
temporal  implementation  rules  or  phonetic  rules  of  implementation. 
Part  of  the  argument  was  described  above  as  it  was  delineated  in 
Flege' s  1979  dissertation.   These  researchers  all  begin  with  the 
Chomsky  and  Halle  (1968,  p.  295)  formulation  concerning  phonetics  and 
phonology  which  they  take  as  implying  that  temporal  implementation  of 
phonetic  features  will  be  supplied  by  universal  rules. 

Some  temporal  effects  may  be  due,  for  example,  to  coordination  of 
muscles  (neurologically  or  mechanically)  for  production  of  feature 
bundles.   However,  Port  and  colleagues  claim  that  while  some  durations 
seem,  indeed,  to  be  universal,  other  durational  effects  are  patterned 
within  the  particular  language.   Language  specificity  has  been  claimed 
for  voice  onset  time  (VOT) ,  the  duration  of  vowels  preceding  voiced  and 


-16- 

voiceless  consonants,  and  the  relative  closure  duration  of  final  stop 
consonants  after  different  vowels,  among  other  parameters.  Rules  for 
these  effects  must,  then,  be  part  of  the  phonology  of  the  language 
rather  than  supplied  by  a  universal  phonology. 

Phonetic  Interference,  then,  seems  to  refer  to  nonsegmental 
differences  between  languages.  A  good  example  of  subsegmental 
differences  is  the  feature  [voice].   If  a  segment  is  considered  to  be 
an  abstract  bundle  of  binary  features,  there  seems  to  be  no  room  for 
specification  of  the  amount  of  voicing  (or  of  the  timing  of  voicing) 
that  should  be  present  for  a  bundle  to  be  considered  [+voice].  Chapter 
2  will  look  more  closely  at  this  example.   Other  examples  of  Phonetic 
Interference  are  discussed  in  the  next  section. 

Other  Evidence  for  A  Phonetic  Interference  Model 

Current  research  is  beginning  to  compile  evidence  for  the  inclusion 
of  specific  noncontrastive  yet  distinctive  material.   Some  examples  are 
given  below,  which  seem  to  support  some  kind  of  Phonetic  Interference 
Model. 

Briere  (1966)  argued  for  the  use  of  nonsegmental  (that  is,  supra- 
or  subsegmental)  material.   Briere  was  interested  in  the  problem, 
discussed  above,  of  predicting  comparative  degree  of  difficulty  for  the 
acquisition  of  sounds.   He  states  that  "precisely  that  articulatory 
feature  which  is  being  ignored  as  redundant  for  classification  may 
frequently  be  paramount  in  determining  the  degree  of  difficulty 
experienced  by  both  encoder  and  decoder  when  confronted  with  the 
phonetic  reality  of  a  target  language  sound"  (p.  770). 


-17- 

Briere  asked  native  American  English  speakers  to  imitate  unfamiliar 
French,  Arabic,  and  Vietnamese  sounds.   Native  speakers  judged  the 
productions.   Briere's  resulting  hierarchy,  based  on  mean  number  of 
correct  learning  and  testing  trials  could  be  explained,  in  some  cases, 
by  looking  at  the  native  and  target  systems  in  detailed  articulatory 
terms.   Though  producing  a  high  back  unrounded  vowel  requires  only  the 
use  of  features  already  present  in  English,  this  sound  seems  to  be  very 
difficult  to  master,  as  difficult  as  the  production  of  a  pharyngeal 
fricative,  which  would  require  new  articulatory  habits  for  native 
speakers  of  English.   (These  two  sounds  were  found  to  be  the  most 
difficult  to  produce  for  the  nonnative  speakers.) 

Hutchinson  (1973)  looked  at  acquisition  of  English  stress  and  final 
syllable  duration  by  Spanish  speakers.  She  found  that  8  of  10  Spanish 
speakers  who  were  taught  only  segmental  aspects  of  English  did  not 
lengthen  their  final  syllables  appropriately  for  English,  retaining 
instead  the  pattern  of  Spanish.   Phonetic  interference,  in  this  case, 
seems  to  be  due  to  differences  in  rules  for  final  syllable  lengthening 
in  English  and  Spanish.   Specifications  of  amount  of  final  syllable 
lengthening  must,  then,  be  included  in  some  fashion  in  the  phonologies 
of  either  those  languages  that  require  it  or  those  languages  that 
neutralize  syllable  final  lengthening. 

Mitleb  (1981,  1984a)  also  states  the  case  for  phonetic  interfer- 
ence and  on  much  the  same  grounds  as  Flege  (1979).  Mitleb  (1984), 
arguing  against  the  universal  segmental  and  segment  feature  phonetics 
of  Chomsky  and  Halle  (1968,  p.  295),  states  that  "temporal  implementa- 
tion differences  [i.e.  nonsegmental]  exist  between  languages  and 
presumably  are  part  of  the  linguistic  knowledge  of  the  speakers.  .  ." 


-18- 
(p.  234).  The  author  compared  vowel  length  contrast  (long/short)  in 
Arabic  and  English.  He  found  that  while  vowel  height  temporality  seems 
to  be  universal,  as  has  been  claimed  (that  is,  low  vowels  are  always 
longer  than  high  vowels),  the  long/short  vowel  length  contrast  related 
to  the  voicing  of  the  following  consonant  seems  to  be  language  specific 
and  not  physiologically  determined,  as  had  been  previously  supposed. 
Arabic,  then,  has  long  and  short  vowels  that  are  phonemic  and  of  a 
particular  length  relative  to  each  other.  English  has  long  and  short 
vowels  that  depend  on  context  and  whose  length  changes  relative  to,  in 
most  cases,  the  following  sound.   Arabic  speakers  speaking  English 
maintain  the  high/low  vowel  length  difference  but  substitute  their 
long/short  contrast  for  the  context  dependent  contrast  found  in  English 
(Mitleb,  1981,  1984). 

Port,  Al  Ani,  and  Maida  (1980)  argue  for  the  existence  of  language- 
specific  timing  rules  that  cannot  be  captured  in  current  segmental 
formulations  of  the  sort  proposed  by  Chomsky  and  Halle  (1968,  p.  295). 
The  authors  state  that  the  postulation  of  universal  implementation 
rules  operating  on  abstract  phonetic  segments  does  not  allow  for 
language  specific  temporal  patterns  that  cannot  be  captured  "with 
differences  in  purely  segmental  features"  (p.  237). 

Japanese  and  Arabic  were  examined  in  the  Port  et  al .  (1980)  study. 
The  authors  found  that  while  medial  /d/  and  /t/  were  of  comparable 
length  in  Arabic;  in  Japanese,  medial  /d/  is  considerably  shorter  in 
duration  than  the  /t/.   Japanese  uses  fairly  constant  syllable 
durations  requiring  timing  adjustments  to  vowels  and/or  consonants 
above  and  beyond  what  is  accepted  as  "inherent."  In  Japanese,  the 
preceding  and  following  vowels  seem  to  compensate  for  the  medial 


-19- 
consonant  length  in  order  to  meet  the  syllable  length  requirement.   In 
Arabic,  however,  only  the  preceding  vowel  varies  by  consonant.  This 
adjustment  applies  only  before  the  intersyllabic  poststressed  consonant 
studied  here.   Arabic  shows  no  difference  in  vowel  duration  in  other 
contexts  (Mitleb,  1984b;  Port  &  Mitleb,  1983).  These  findings  are 
taken  as  evidence  for  phonetic  interference,  which  supports  the  notion 
of  language-specific  rules  of  timing.   This  is  especially  important 
since  temporal  compensation  seems  to  extend  across  syllables  in  cases 
such  as  the  Japanese  one  mentioned  above.   It  is  the  claim  of  the  Port, 
et  al .  (1980)  study  that  what  they  call  "temporal  macrostructures" 
cannot  be  described  in  segment  based  models  of  phonetics  and  phonology. 

A  related  example  is  presented  by  Bush  (1967).  American,  British, 
and  Indian  English  were  compared  to  determine  whether  or  not  the 
duration  ratios  of  sounds  in  stressed  syllables  in  each  dialect  might 
contribute  to  the  effect  of  dialect.  Nonsense  VCVC  syllables  were 
produced  by  the  speakers  in  this  study.   The  author  measured  the  ratio 
of  the  duration  of  the  intervocalic  consonant  to  the  duration  of  the 
stressed  vowel  and  the  ratio  of  the  duration  of  the  consonant  closure 
to  the  duration  of  the  consonant  release.  Clear  differences  were  seen 
in  these  ratios  for  the  three  dialects.  Overall,  Americans  tended  to 
produce  longer  syllables  than  did  the  British,  who  produced  longer 
syllables  than  the  Indians.   However,  Bush  found  that  it  was  the  ratios 
mentioned  above  that  produced  the  differences  rather  than,  for  example, 
total  vowel  duration  in  itself.   In  the  case  of  the  consonant  closure 
versus  consonant  release  ratio,  the  reported  ratios  for  /p/  were 
American  2:1,  British  2.5:1,  and  Indian  4:1.   The  evidence  in  this 
study  seems  to  indicate  that  phonetic  interference  may  extend  across 


-20- 

segments  (consonant/vowel  ratio)  as  well  as  subsegmentally  (consonant 
closure/release  ratio) . 

Recently,  Keating  (1985),  in  a  chaper  entitled  "Universal  phonetics 
and  the  organization  of  grammars,"  discussed  the  general  question 
asked  in  the  beginning  of  this  chapter  concerning  the  relation  between 
phonetics  and  phonology.   Keating  (1985)  makes  the  strong  claim  that 
all  phonetic  phenomena  are  controlled  by  rule  rather  than  resulting 
from  the  physical  requirements  of  the  speech  mechanism.   "It  seems 
likely  that  there  are  no  true  linguistic  phonetic  universals  and  that 
the  grammar  of  a  language  controls  all  aspects  of  phonetic  form" 
(p.  129).  Among  the  types  of  evidence  that  Keating  offers  to  support 
this  claim  are  facts  concerning  extrinsic  vowel  duration.   Extrinsic 
vowel  duration  includes  contextual  effects  such  as  the  changing 
duration  of  vowels  before  voiced  and  voiceless  stops. 

In  an  earlier  study  (Keating,  1979),  the  author  had  found  that 
Polish  and  Czech  do  not  exhibit  the  supposedly  universal  differential 
vowel  duration  based  on  the  voicing  of  the  following  stop.   (This 
finding  confirms  the  findings,  described  above,  by  Flege  (1979)  and 
Mitleb  (1984)  concerning  the  lack  of  this  vowel  distinction  in  Arabic.) 
Keating  (1985)  described  the  physiologically  based  argument  that  claims 
that  the  vowel  duration  effect  is  explained  by  the  fact  that,  in 
English,  the  following  stops  are  inversely  related  in  duration  to  the 
length  of  the  vowel.   That  is,  a  shortened  vowel  will  be  followed  by  a 
voiceless  stop  with  a  long  closure  duration  while  a  longer  vowel  will 
be  followed  by  a  voiced  stop  with  a  shorter  closure  duration.   Polish, 
Keating  (1979)  found,  showed  the  differential  closure  duration  based  on 
stops  but  not  the  differential  vowel  effect.   Keating  (1985)  takes  this 


-21- 

finding  to  indicate  that  the  two  durations  (vowel  and  consonant)  are 
not  physiologically  determined  by  each  other.   This  type  of  evidence 
supports  Keating 's  claim  (1985)  and  the  claim  of  Flege  (1979)  that  the 
phonology  of  a  language  needs  to  take  into  account  what  had  been 
previously  considered  to  be  phonetic  "detail."  "Rules  of  phonetic 
vowel  duration  as  a  function  of  voicing  of  a  following  consonant  must 
be  language  specific"  (Keating,  1985,  p.  122). 

Research  is  obviously  necessary  to  determine  the  relevant  dimen- 
sions of  differences  between  languages  (Flege,  1986).   Some  differences 
may  turn  out  to  be  inconsequential  for  accent  detection.   For  example, 
in  the  Briere  (1966)  study  mentioned  above,  the  author  states  that 
acoustic  analysis  of  subjects'  productions  in  terms  of  length  of 
aspiration  versus  closure  duration  showed  wide  variations  in  items  that 
were  rated  "near  native  proficiency"  by  the  judges.  Briere  does  not 
provide  the  data  on  which  he  based  this  statement.   Preliminary 
research  that  has  been  done  and  that  is  cited  here  needs  to  be 
interpreted  in  light  of  the  new  model  and  new  data  need  to  be  col- 
lected. 

Summary 

One  question  regarding  a  model  for  acquisition  has  been  discussed: 
What  levels  of  sound  organization  must  be  taken  into  account?  Evidence 
has  been  discussed  that  seems  to  show  that  a  model  must  account  for  a 
range  that  includes  subphonemic  and  supraphonemic  "detail." 

M.Y.  Liberman  (1983),  in  a  chapter  entitled  "In  favor  of  some 
uncommon  approaches  to  the  study  of  speech,"  also  argues  (from  the 
basis  of  work  on  intonation)  that  the  traditional  separation  of 


-22- 

phonetics  and  phonology  currently  serves  neither  field  well,  especially 

in  speech  production  research.   He  says: 

Regardless  of  background,  all  students  of  intonation 
must  think  for  themselves  about  what  basic  categori- 
zation of  intonational  phenomena  should  be  before 
they  can  even  begin  an  informal  investigation. 
Their  research  is  (or  should  be)  constantly  drawn 
back  to  this  fundamental  question:  Each  advance  in 
the  basic  categories  of  description  permits  the 
interpretation  of  a  broader  range  of  data,  which 
often  suggests  a  new  modification  of  the  initial 
descriptive  assumptions  (p.  268). 

The  Phonetic  Interference  Model  described  above  seems  to  have  been 
developed  in  response  to  a  generally  felt  need  on  the  part  of  second 
language  researchers  and,  perhaps,  others  in  more  traditional  areas. 
As  Liberman  points  out,  there  are  good  and  compelling  reasons  why 
phoneticians  and  phonologists  have  proceeded  from  different  viewpoints. 
But,  at  this  point  in  time,  a  "hybrid  theory"  (Liberman,  1983)  may  be 
productive  for  both  fields. 

Kent's  1983  chapter  on  "The  Segmental  Organization  of  Speech" 
concludes: 

There  is  good  reason  to  believe  that  speech  is  a 
complex  recoding  of  a  linguistic  message  into  move- 
ments of  the  speech  organs  and  the  acoustic  signal 
of  speech.   Furthermore,  the  nature  of  this  recoding 
seems  to  be  such  as  to  obscure  certain  segmental 
properties  in  a  way  that  is  neither  predicted  nor 
explained  by  linguistic  analysis  per  se  .  .  .  All  of 
the  major  processes  and  influences  must  be  consid- 
ered in  order  to  arrive  at  a  sensible  description  of 
speech  behavior  (p.  85-86). 

Kent  seems  to  be  suggesting  a  need  for  consideration  of  more  data 
in  a  linguistic  analysis  than  has  traditionally  been  considered 
appropriate.   This  section  offered  arguments  for  a  wider  interpretation 
of  phonology  in  linguistic  analyses  based  on  evidence  from  second 


-23- 

language  pronunciation  acquisition.  Flege's  (1979)  model  seems  to 
offer  possibilities  for  research  into  phonology,  which  seems  to  be 
needed  in  second  language  acquisition,  by  suggesting  a  framework  within 
which  hypotheses  may  be  tested. 

A  Prediction  of  the  Phonetic  Interference  Model 

This  section  will  be  concerned  with  one  claim  of  the  proposed 
Phonetic  Interference  Model.   This  claim  will  be  used  as  the  basis  for 
the  present  research. 

In  1981 ,  Flege  proposed  a  "Phonological  Translation  Hypothesis" 
based  on  his  (1979)  Phonetic  Interference  Model.   This  hypothesis 
states  that  adults  learning  a  second  language  (L2)  base  their  interpre- 
tation of  the  sounds  of  L2  (and,  thus,  their  production  of  the  sounds 
of  L2)  on  the  sounds  found  in  their  native  language  (LI).   Further,  the 
perceptual  target  (or  phonetic  representation  or  prototype  or  acoustic 
model)  that  is  implemented  in  L2  will  be  provided  jointly  by  LI  and  L2 , 
resulting  in  productions  that  conform  entirely  to  neither  in  some 
aspects.  The  following  section  reviews  some  experimental  data  on 
production  that  seem  to  suggest  that  such  mergers  of  categories  do 
happen. 

Williams  (1980)  found  that  Spanish-English  bilinguals  produced 
initial  /b/  in  English  with  average  VOT  values  that  fell  between  the 
average  values  normally  found  in  Spanish  and  English.   Caramazza  et  al. 
(1973)  examined  the  prevocalic  stop  consonants  of  French-English 
bilinguals.   Though  the  bilinguals  matched  unilingual  French  speakers 
in  VOT  in  French;  when  speaking  English,  they  used  intermediate  values 


-24- 

for  VOT  that  were  closer  to  those  found  in  French  than  English.  Mack 
(1982)  also  looked  at  French-English  bilinguals.   In  this  case,  vowel 
duration  in  the  two  languages  produced  by  bilinguals  was  compared.   The 
speakers  produced  vowel  durations  in  English  that  were  longer  than 
those  required  for  French  but  still  shorter  than  necessary  to  conform 
to  the  pattern  of  English.   Finally,  Flege  and  Hillenbrand  (1984)  found 
that  experienced  French-English  bilinguals  produced  /t/  with  a  VOT 
intermediate  to  the  values  found  in  English  and  French. 

When  bilinguals  learn  a  second  language,  then,  they  do  not  merely 
produce  segments  according  to  the  specifications  of  their  first 
language  but  tend  to  make  changes  in  many  aspects  of  the  segment  and 
its  relationship  to  other  segments  over  time. 

Limits  on  Accuracy 

The  central  phonetic  representations  (or  prototypes)  have,  in  some 
cases,  been  modified  by  L2  learning,  as  was  shown  in  the  previous 
section.   Flege  and  his  colleagues  (Flege  &  Hillenbrand,  1984,  Flege, 
1981,  1986)  claim  that  there  is  an  upper  limit  on  the  modifications  and 
accuracy  that  can  be  achieved  in  L2  pronunciation. 

Port  and  Mitleb  (1983)  also  argue  for  limits  on  accuracy.   In  their 
study,  Arabic  speakers  were  able  to  acquire  the  American  English 
phonological  flapping  rule  for  syllable-final  apical  stops,  but  were 
unable  to  accurately  produce  temporal  implementation  of  VOT  in  initial 
stops  or  to  sufficiently  increase  vowel  duration  contrast  for  final 
voiced  and  voiceless  conditions  in  English.   The  authors  interpret 
these  findings  as  evidence  that  phonological  rules  are  more  easily 


-25- 

altered  in  language  acquisition  in  adults  than  are  temporal  implementa- 
tion rules.  The  major  factor  here  is  deemed  by  Port  and  Mitleb  (1983) 
to  be  the  plasticity  of  the  human  nervous  system. 

The  modifications  may  take  place  through  what  has  been  called 
"interlingual  identification"  (Weinreich,  1953)  in  which  auditory,  and 
perhaps  articulatory ,  similarity  of  LI  and  L2  sounds  causes  the  speaker 
of  L2  to  substitute  LI  sounds.  At  least  some  sounds  in  L2  are 
perceived  to  be  sounds  in  LI  by  the  language  learner.   Flege  (1984, 
1987)  attributes  this  perception  to  a  mechanism  called  equivalence 
classification.   Flege  (1987)  defined  equivalence  classification  as 
...  a  basic  cognitive  mechanism  which  permits  humans  to  perceive 
constant  categories  in  the  face  of  the  inherent  sensory  variability 
found  in  the  many  physical  exemplars  which  may  instantiate  a  category" 
(1987,  p.  49).   As,  and  if,  perceptions  change,  the  productions  will 
change  as  in  the  mergers  described  above.   Flege  and  his  colleagues 
predict  that  some  learning  will  take  place  but  that  "...  the 
previous  phonetic  experience  impedes  the  formation  of  accurate 
perceptual  targets  for  phones  in  L2"  (Flege  and  Hillenbrand,  1984, 
p.  718).   Learners  will  judge  two  sounds,  one  from  LI  and  one  from  L2 , 
to  be  "acoustically  different  realizations  of  the  same  category" 
(p.  718).   It  appears  that  Flege  is  claiming  that  the  two  competing 
sound  systems  exert  a  kind  of  continual  pull  in  different  directions 
and  that  the  new  categories  formed  by  the  speaker  can  only  consist  of  a 
kind  of  compromise.   Flege  is  referring  here  to  "similar"  sounds  in  L2 
rather  than  "new"  sounds.  The  next  section  will  consider  the  differ- 
ence between  "new"  and  "similar"  sounds. 


-26- 

New  Versus  Similar  Sounds 

There  is  a  sense  in  which  some  sounds  in  L2  will  be  "new"  sounds 
for  the  student  and  some  will  be  "similar"  to  sounds  in  LI .  Exactly 
what  is  meant  by  these  terms  has  not  been  clearly  defined.   Sounds 
that  are  transcribed  using  the  same  IPA  symbol  may  be  acoustically 
identical,  yet  often  they  are  not.   There  is,  of  course,  a  range  of 
acceptable  variation  even  among  speakers  of  the  same  language. 
However,  Flege  (1984)  showed  that  unsophisticated  listeners  are  able  to 
detect  the  differences  between  English  and  French  /t/  on  hearing  only 
single  syllables.   Flege  played  samples  ranging  in  length  from  (1) 
phrases,  to  (2)  portions  of  syllables,  to  (3)  the  first  30  ms  of  "two" 
for  sophisticated  and  naive  listeners.  The  most  sophisticated 
listeners  (one  phonetics  course)  were  able  to  detect  accent  reliably 
from  only  the  burst  of  the  /t/.  Flege  takes  these  results  as  evidence 
for  very  closely  specified  prototypes,  which  will  determine  what  is 
"new"  and  what  is  "similar"  to  the  language  learner. 

Flege  (1986)  defined  "new"  sounds  as  those  which  "do  not  bear 
sufficient  resemblance  to  any  sound  in  LI  that  they  are  likely  to  be 
regarded  by  the  experienced  L2  learner  as  belonging  to  an  already 
familiar  LI  category"  (p.  48).  What  Flege  calls  "similar"  sounds  have 
a  "close  yet  acoustically  different  counterpart  in  Ll"  (p.  48).   What 
turns  out  to  be  new  or  similar  will  depend  on  the  languages  being 
compared  and  may  be  meaningful  only  in  that  context.   Flege  and 
Hillenbrand  (1984)  note  that  "it  should  be  apparent  that  any  phone 
encountered  in  a  foreign  language — no  matter  how  exotic — is  likely  to 


-27- 
bear  some  degree  of  articulatory  and  acoustic  similarity  to  phones 
found  in  the  learner's  LI"  (p.  708). 

Problems  arise  in  deciding  the  category  to  which  an  L2  sound 
belongs.  Perhaps  most,  if  not  all,  L2  sounds  should  be  considered 
similar  until  proven  otherwise.   This  assumption  would  imply  that  in 
the  Phonetic  Interference  Model,  all  sounds  produced  by  the  L2  speaker 
will  consist  of  a  compromise  between  the  L2  sound  and  a  "similar"  LI 
sound. 

In  a  recent  paper,  Flege  (1987)  attempted  to  examine  further  the 
new/similar  distinction  as  it  relates  to  equivalence  classification. 
Two  French  sounds,  /u/  and  /y/,  as  produced  by  groups  of  native  English 
speakers  with  various  levels  of  experience  in  the  L2 ,  were  used  to  test 
the  difference  in  acquisition  between  new  and  similar  sounds.   Flege 
classified  /u/  as  a  similar  sound  because  it  occurs  in  English  and 
classified  /y/  as  a  new,  non-English  sound.   He  predicted  that  /y/ 
would  be  produced,  by  the  most  experienced  speakers,  more  authentically 
than  would  /u/.  He  also  predicted  that  the  /u/  would  be  more  authentic 
when  produced  by  the  more  experienced  nonnative  speakers  than  when  it 
was  produced  by  less  experienced  nonnative  speakers.   Finally,  Flege 
examined  equivalence  classification  by  examining  the  production  of  /t/ 
by  bilingual  English  and  bilingual  French  speakers.   The  merger 
hypothesis  would  predict  that  the  native  English  speakers'  and  the 
native  French  speakers'  productions  of  /t/  in  their  native  language 
would  be  affected  by  merging  the  French  and  English  /t/. 

With  regard  to  the  /t/ ,  Flege  found  that  very  experienced  English 
bilinguals  produced  a  contrast  in  VOT  values  for  the  English  and  French 
/t/  productions.   These  English  bilinguals,  however,  produced  a  French 


-28- 

/t/  that  was  relatively  too  long  and,  more  interestingly,  produced  an 
English  /t/  that  was  relatively  too  short.  Both  experienced  English 
and  French  bilinguals  produced  a  new  kind  of  /t/  when  speaking  their 
native  language.  Flege  takes  this  as  strong  evidence  against  the 
interference  theories  that  posit  one  way  effects  (L1->L2)  on  produc- 
different  VOT  values  for  English  /t/  and  French  /t/.   These  sets  of  VOT 
values,  however,  were  correct  for  neither  their  LI  nor  their  L2 .   Flege 
claims  that  this  happens  because  the  /t/  in  both  languages  is  similar 
though  not  identical  and  the  equivalence  classification  mechanism  is 
operating. 

For  the  vowels  /u/  and  /y/,  the  prediction  was  confirmed  that  the 
new  sound,  /y/,  was  produced  authentically  by  English  bilinguals  while 
the  similar  /u/  was  not.   (Authenticity  in  the  case  of  the  vowels  was 
related  to  second  formant  values.)  For  the  /y/,  the  second  formant  of 
the  most  experienced  English  bilinguals  did  not  differ  significantly 
from  the  second  formant  of  monolingual  French  speakers.   The  French  /u/ 
vowel  production  by  English  bilinguals  approximated  but  did  not  match 
the  norms  for  that  vowel  when  produced  by  monolingual  French  speakers. 
In  this  case,  however,  experience  with  the  L2  sound  did  not  seem  to 
affect  production  of  that  sound  in  English. 

Summary 

Flege  states:  "An  individual  might  learn  to  speak  a  foreign 
language  without  apparent  accent,  but  a  fine-grained  acoustic  analysis 
should  reveal  differences  between  the  language  learners  and  native 
speakers  along  those  parameters  where  phonetic  differences  exist 


-29- 

between  the  native  and  target  language"  (1981,  p.  452).  This  statement 
is  the  basis  of  the  present  research. 

Flege  and  his  colleagues  have  proposed  that  accuracy  of  production 
of  L2  sounds  will  be  limited  in  some  respects.   They  claim  that 
"foreign  accent"  should  be  apparent  in  the  acoustic  analysis  of 
speakers  who  are  not  perceived  under  normal  circumstances  to  have  an 
accent.   From  the  evidence  reported  here,  timing  and  duration  of  the 
articulatory  gestures  (as  revealed  in  acoustic  analyses)  are  likely 
carriers  of  accent  in  L2 ,  since  languages  may  differ  in  the  phonetic 
specifications  for  a  sound  even  when  that  sound  is  easily  categorized 
as  an  LI  sound  (by  IPA  symbol,  for  example). 

The  next  section  will  discuss  the  analysis  of  timing  and  duration 
as  a  way  of  testing  the  claim  discussed  in  this  section.   A  general 
hypothesis  will  be  offered  for  the  present  research. 

The  Hypotheses  of  This  Research 

One  way  to  examine  the  claim  that  L2  learners  will  never  achieve 
accuracy  in  pronunciation  of  the  second  language  is  to  investigate,  in 
detail,  the  productions  of  bilinguals  who  seem  to  have  little  or  no 
accent,  using  the  kind  of  fine-grained  acoustic  analysis  proposed  by 
Flege  (1981).  If  these  bilinguals  mirror  the  range  of  production  of 
native  speakers  in  L2,  then  an  important  part  of  the  model  needs  to  be 
reconsidered.   "The  most  important  prediction  of  Flege 's  (1981)  model 
is  that  L2  learners  will  never  match  monolingual  native  speakers  of  L2 
because  the  prototypes  they  develop  for  L2  differ  from  those  of  LI 
native  speakers"  (Flege,  1986,  p.  132).   If  the  bilinguals  do  not  match 


-30- 

native  speakers,  further  evidence  will  exist  for  Flege's  proposed 
Phonetic  Interference  Model. 

A  possibility  is  that  these  bilinguals'  productions  will  closely 
approach  the  productions  of  native  speakers.   This  would  be  an 
important  finding,  because  the  effects  of  the  merger  of  the  two  systems 
needs  to  be  more  clearly  specified.   The  merger  may  effect  a  range  of 
new  categories,  beyond  the  midpoint  of  values  between  the  two  languages 
and  closer  to  the  specifications  of  L2 .   The  range  of  variation 
possible  has  not  been  specified  for  any  one  language,  much  less  for 
bilingual  productions. 

In  this  study,  phonetic  implementation  of  several  duration  factors 
in  English  by  speakers  whose  first  language  is  Arabic,  Spanish,  Thai, 
and  Korean  will  be  compared.   The  duration  factors  being  tested  will  be 
those  most  readily  available  for  comparison  in  English  and  the 
languages  above.  These  include 

a.  VOT 

b.  Medial  Vowel  Duration 

c.  Final  Closure  Duration 

d.  Vowel/Consonant  Ratio 

Chapter  2  will  be  concerned  with  specification  of  these  factors  and 
their  measurement  for  the  speakers  of  the  languages  to  be  studied. 

The  general  hypothesis  of  this  research  is  as  follows:  Even 
bilinguals  who  are  judged  to  have  no  accent  when  speaking  English  will 
show  significant  differences  from  monolingual  English  speakers  in 

A.  the  relative  timing  of  the  onset  of  glottal  pulsing  in  word 
initial  stop  consonants,  when  that  timing  is  different  in  their  native 
language; 


-31- 

B.  the  relative  duration  of  stressed  vowels,  when  that  duration 
is  different  in  their  native  language; 

C.  the  relative  duration  of  the  final  consonant  closure  when  that 
duration  is  different  in  their  native  language; 

D.  the  relationship  between  the  duration  of  the  stressed  vowel  and 
the  voicing  of  the  following  stop  consonant,  when  that  relationship  is 
different  in  their  native  language. 


CHAPTER  2 
DURATION  FACTORS 

Introduction 


Excellent  speakers  of  a  second  language  (L2)  will,  of  course, 
learn  to  produce  sounds  that  are  not  used  in  their  first  language 
(LI).  Stereotyped  imitations  of  foreign  accent  easily  reveal 
noticeable  deviations  in  segment  use  (and,  sometimes,  positional 
allophone  use).   Familiar  examples  include  /d?es/  for  "yes"  by 
Spanish  speakers  and  the  /r/-/l/  difficulties  of  many  native  speakers 
of  Asian  languages.   Excellent  speakers  of  English  as  a  second 
language  will  usually  achieve  acceptable  production  of  the  most 
commonly  noticed  segmental  deviations.   However,  even  after  segments, 
in  themselves,  have  been  mastered,  the  perception  by  native  speakers 
of  foreign  accent  often  remains.  Models  of  sound  acquisition  for 
second  languages  have,  until  recently,  offered  limited  explanations 
for  the  acoustic  effect  of  foreign  accent  after  the  acquisition  of  L2 
segments.  This  may  be,  according  to  Flege  (1979),  because  segment 
based  phonologies  do  not  take  into  account  some  important  aspects  of 
production  differences  between  sound  systems.   Chapter  1  discussed 
more  recent  models  by  Flege  and  some  others  which  seem  to  invite 
research  to  determine  if  these  models  and  especially  the  Phonetic 
Interference  Model  developed  by  Flege  (1979,  1986)  more  accurately 
reflect  the  acquisition  process. 


-32- 


-33- 
Chapter  1  reviewed  evidence  that  timing  and  duration  of  segments 
have  been  shown  to  play  at  least  some  role  in  producing  the  acoustic 
effect  of  foreign  accent.  Little  is  known  about  the  details  of 
duration  of  segments  and  the  relationships  among  durations  of 
segments  in  languages  other  than  English,  where  researchers  have 
concentrated  their  efforts,  although  this  situation  is  changing.  The 
present  study  will  examine  the  extent  to  which  superior  bilinguals 
have  acquired  the  American  English  temporal  specification  for  some  of 
these  known  parameters  of  timing  in  producing  English  and  for  some 
known  interrelationships  among  some  of  these  parameters  in  English. 
The  Phonetic  Interference  Model  reviewed  in  Chapter  1  predicts 
interference  in  acquisition  of  L2  from  phonetic  as  well  as  from 
phonological  aspects  of  the  LI.  Flege  (1986)  specifically  predicts 
that  temporal  accuracy  will  be  impossible  because  the  L2  speaker  will 
have  developed  prototypes  for  sounds  found  in  LI  that  prohibit 
accurate  reproduction  of  similar  sounds  as  specified  in  L2 .  There- 
fore, even  superior  bilinguals  will  not  completely  revise  their 
temporal  implementation  rules  in  producing  L2 .  The  present  study 
will  examine  superior  bilinguals  in  order  to  begin  to  test  this 
prediction.  The  following  sections  concern  the  parameters  for  timing 
to  be  examined  in  this  study. 

Duration  of  Segments 

The  purpose  of  the  present  study  is  to  examine  the  duration  of 
some  segments  or  parts  of  segments  in  the  English  spoken  as  a  second 
language  by  nonnative  speakers  in  order  to  compare  the  productions  of 


-34- 
second  language  users  to  the  productions  of  native  language  users. 
Typically,  in  this  type  of  study,  the  researcher  collects  data  from 
sets  of  LI  and  L2  speakers  and  assumes  that  the  productions  of  the  LI 
speakers  in  the  study  are  representative  productions  of  the  native 
language  in  the  particular  context  of  the  stimuli  used  in  the  study. 
The  aim  is  to  find  any  systematic  and  reliable  differences  between 
the  production  of  a  sound  by  the  L2  speakers  and  the  production  of 
that  sound  by  LI  speakers  within  the  context  of  the  stimuli  used. 
These  differences  are  then  attributed  to  the  influence  of  the 
nonnative  speaker's  first  language.   The  primary  method  of  discover- 
ing these  variations  in  sound  production  is  through  examination  of  a 
transformation  of  the  acoustic  signal  into  a  visible,  measurable 
recording  (spectrogram,  waveform,  etc.). 

Perusal  of  the  literature  concerning  the  prediction  of  segment 
duration  in  English  (see  the  following  sections)  indicates,  most 
obviously,  that  timing  is  difficult  to  predict  for  specific  individ- 
ual segments  or  groups  of  segments  in  running  speech,  because  it  may 
involve  (a)  an  interplay  of  factors,  which  may  range  from  speaking 
rate  to  the  amount  of  emphasis  placed  on  a  particular  word  (Umeda 
1975;  1977)  and,  (b)  a  range  of  acceptable  values  rather  than 
absolute  values.  Mono-language  researchers  may  report  the  mean 
duration  of  a  segment  measured  over  several  speakers  or  mean 
durations  of  a  single  speaker's  segments  under  different  conditions. 
Different  segments  do  not  seem  to  be  affected  in  the  same  way  by 
context  (Oiler,  1973).   Mean  values  for  duration  are  used  because  of 
the  well  known  variability  that  exists  even  in  successive  productions 


-35- 

of  the  same  utterance  by  the  same  speaker.  Group  data  are  often  used 
for  the  same  reason. 

Because  the  length  of  any  given  segment  (for  example,  a  particu- 
lar vowel)  may  be  affected  by  many  different  factors,  even  the 
comparison  of  vowel  duration  findings  across  different  studies  by 
different  authors  is  difficult.   The  published  findings  discussed 
below  are  reviewed  because  they  give  a  general  idea  of  proportions 
and  relations  exhibited. 

This  study  will  not  attempt  to  use  an  absolute  value  of  segments 
relative  to  some  standard  but  rather  will  consider  the  proportional 
relationship  between  the  duration  of  sets  of  segments.   For  the 
purposes  of  the  study,  the  following  sets  will  be  examined: 

1)  /p  -  b/:  relative  onset  of  glottal  pulsing  (VOT)  for  word 
initial  stops. 

2)  /i  -  I/:  relative  duration  of  tense/lax  vowels. 

3)  /t  -  d/ :  relative  closure  duration  of  the  final  consonant. 

4)  vowel/consonant  ratio:  relative  duration  of  /i,  1/  before 
voiced  and  voiceless  consonants. 

Duration  Parameters  in  English  and  Other  Languages 

Voice  Onset  Time  (VOT) 

The  first  parameter  of  concern  in  this  study  is  Voice  Onset  Time 
(VOT).   Bilabial  stop  consonants  or  plosives  are  usually  divided  into 
two  types:  voiced  and  voiceless.  The  duration  of  an  initial  bilabial 
stop  consonant  is,  very  generally,  made  up  of  a  closure  of  the  lips 


-36- 

and  a  subsequent  release  of  pressure  built  up  between  the  lips  and 
the  glottis.   The  vocal  folds  may  begin  vibrating  at  some  point 
during  this  process.   Voice  Onset  Time  (VOT)  is  the  temporal 
relationship  between  the  onset  of  glottal  pulsing  and  the  release  of 
the  initial  stop  consonant  (Lisker  &  Abramson,  1964).   In  some  cases, 
such  as  medial  stops  and  in  running  speech  in  which  non  utterance- 
initial  word-initial  stops  are  found,  the  vibrations  may  be  continu- 
ous from  the  preceding  voiced  sounds  (Lisker  &  Abramson,  1964;  Flege, 
1982;  Flege  &  Brown,  1982). 

It  is  important  to  consider  these  consonants  outside  their 
language-specific  requirements.   Categorical  perception  on  the  part 
of  the  listener  tends  to  limit  our  hearing  to  the  native  language 
categories  of  /p/  and  /b/  if  we  are  native  English  speakers.   In 
fact,  there  is  closure  and  release  with  glottal  pulsing  beginning  at 
some  point  in  time.   Languages  exist  in  which  glottal  pulsing  begins 
at  different  approximate  points  in  time  for  sets  of  bilabial, 
alveolar,  and  velar  stops.   Lisker  and  Abramson  (1964)  found  that 
languages  generally  choose  2,  3,  or  4  contrastive  points  in  time  to 
begin  voicing,  producing  2,  3,  or  4  phonemic  stop  sets.   Average 
Voice  Onset  Time  (VOT)  tends  to  cluster  in  a  range  around  these 
points  in  time.   Korean,  for  example,  has  three  categories  of 
bilabial  stops.   English  uses  two  bilabial  stop  categories.   English 
has  a  third  category  of  unaspirated  voiceless  bilabial  stops,  but 
these  do  not  occur  in  initial  position  in  English  and  so  are  not 
contrastive  phonemes  although  the  categories  of  aspirated  and 
nonaspirated  bilabial  stops  are  phonemic  in,  for  example,  Korean. 
Also,  although  Korean  and  Thai  are  both  three  category  languages,  the 


-37- 
categories  do  not  overlap.  That  is,  Korean  and  Thai  have  three 
phonemic  bilabial  stops,  but  the  average  VOT  values  of  these  stops  do 
not  correspond. 

A  more  usual  point  of  view  in  discussing  glottal  timing  is  to  say 
that  listeners  perceive  categories  of  stop  consonants  in  a  way  that 
is  dependent  on  the  language.  That  is,  for  a  given  onset  of  pulsing 
in  a  given  specific  bilabial  stop  consonant  production,  the  speaker 
will  label  that  example  according  to  the  categories  used  in  the 
language.   For  the  present  purpose,  however,  a  shift  in  perspective 
is  necessary.   From  the  point  of  view  of  production  rather  than 
perception,  speakers  of  a  language  make  complex  articulatory  and 
laryngeal  adjustments  in  order  to  produce  separate  ranges  of  VOT 
centered  around  specific  points.   Some  problems  become  apparent  when 
the  viewpoint  shifts  from  perception  and  categorization  to  cross- 
language  production  comparisons.  These  problems  will  be  discussed  in 
this  chapter. 

Languages  are  usually  discussed  (Lisker  &  Abramson,  1964)  in 
terms  of  having  voicing  lead,  in  which  the  glottal  pulsing  precedes 
the  release  of  air  pressure  by  50-150  ms;  short  lag  voicing,  in  which 
glottal  pulsing  follows  the  release  by  0-30  ms;  and  long  lag  voicing, 
in  which  the  glottal  pulsing  follows  the  release  by  50-110  ms.  Table 
1  shows  the  classifications  of  the  four  languages  of  interest  in  this 
study  in  these  terms.   Languages  making  use  of  each  category  of 
bilabial  stop  are  marked  with  an  X.   The  languages  of  concern  in  this 
study  include  1  one-category  language  (Arabic),  2  two-category 
languages  (English,  Spanish),  and  2  three-category  languages  (Korean, 
Thai). 


-38- 
Table  1:  VOT  classification  for  bilabial  stops  in  five  languages 


Language       Lead       Short  Lag      Long  Lag      #  of  Stops 


Arabic  X  1 

English                     X  X  2 

Spanish  XX  2 

Korean                    X  X  *  X  3 

Thai  XX  X           3 


Source:  Arabic  data  from  Yeni-Komshian  et  al.,  1977.   All  others  from 
Lisker  &  Abramson,  1964. 
*  XX  for  Korean  indicates  that  Korean  uses  two  short  lag 
bilabial  stops. 


-39- 

Table  2  shows  the  mean  VOT  and  range  of  VOT  used  in  isolated 
words  in  English  and  in  the  other  languages  from  the  Lisker  and 
Abramson  (1964)  study  and  Yeni-Komshian ,  Caramazza,  and  Preston 
(1977),  which  is  included  because  Lisker  and  Abramson  (1964)  did  not 
examine  Arabic.   The  ranges  specified  indicate  the  range  of  time 
within  which  voicing  began  for  speakers  of  the  particular  language  in 
the  context  of  the  study. 

It  is  important  to  note  that  the  Lisker  and  Abramson  (1964)  study 
used  productions  from  only  1  to  4  speakers  of  each  language  resulting 
in  a  limited  number  of  examples  of  each  stop.   No  attempt  was  made  to 
control  for  the  vowel  following  the  stop  consonant.   The  Yeni- 
Komshian  et  al.  (1977)  study,  on  the  other  hand,  reported  values  for 
eight  speakers  for  /b/  with  a  following  /i/,  /u/,  or  /a/.   The  values 
for  /bi/  are  given  in  Table  2  because  this  vowel  will  be  used  in  the 
present  study.  The  spread  of  VOT  values  for  /bi/  found  in  the 
Yeni-Komshian  et  al.  (1977)  study  range  from  -125  to  +20  ms.   (Only 
negative  values  were  found  for  /ba  bu/  in  this  study.)  Arabic  does 
not  have  a  /p/  phone,  and  the  range  found  by  the  Yeni-Komshian  et  al. 
(1977)  study  seems  to  imply  that  close  control  over  the  timing  of 
initiation  of  glottal  pulsing  is  not  required  in  this  language. 

Two  values  are  given  in  Table  2  for  English  /b/  although  English 
is  generally  considered  to  have  only  2  phonemic  stop  categories. 
Lisker  and  Abramson  (1964)  found  that  speakers  are  fairly  consistent 
in  their  use  of  either  lead  or  short  lag  voicing  for  /b/.   They 
report  that  one  of  the  four  speakers  was  responsible  for  95%  of  the 
stops  with  voicing  lead  while  two  speakers  produced  no  voicing  leads. 
English  /b/,  though  generally  accepted  to  be  a  stop  requiring  voicing 


-40- 


Table  2:  Mean  VOT  and  range  in  ras  for  initial  bilabial  stops  in 
isolated  words  in  five  languages 


Language 


English 

Arabic 

Korean 

Spanish 

Thai 


VOT 


-100 
-40  * 
+  7 
-138 
-97 


+  1 


+58 


+18   +91 

+  4 

+  6   +64 


Range  of  VOT  Value 


-130: -20  0:5     20:120 
-125:20 

0:15  10:35    65:115 

-235: -60  0:15 

-165: -40  0:20    25:100 


Sources:  Lisker  &  Abramson,  1964,  except  *  which  is  from  Yeni- 
Komshian  et  al . ,  1977. 
*  data  reported  by  Yeni-Komshian  et  al.,  1977,  combines 
isolated  words  and  sentential  productions  for  /bi/. 


-41- 

at  or  shortly  after  the  release  of  the  stop,  seems  to  allow  optional 
lead  voicing.   Lead  voicing  would  not  invoke  confusion  between  the 
two  categories  of  stops,  which  are  generally  produced  with  short  and 
long  lag  voicing  initiation. 

Flege  (1982)  looked  at  utterance  initial  /b/  production  using  (a) 
the  acoustic  signal,  (b)  glottal  pulsing  measured  by  a  contact 
microphone  on  the  throat,  (c)  intraoral  breath  pressure,  and  (d)  a 
laryngograph  to  measure  adduction  of  the  vocal  folds.   He  found  that 
the  subjects  were  clearly  differentiating  /b-p/  by  laryngeal  timing 
although  in  two  different  ways.   Of  the  ten  speakers,  two  did  not 
adduct  their  vocal  folds  prior  to  stop  release  and  thus  did  not  pre- 
voice.  These  speakers  produced  short  lag  /b/.  The  other  speakers 
adducted  their  vocal  folds  early,  but  the  resulting  acoustic  signal 
showed  either  lead  or  short  lag  voicing  for  these  subjects'  produc- 
tions.  Speakers  consistently  chose  one  or  the  other  pattern  of 
laryngeal  timing.   Both  ranges  for  /b/  are  reported  here  because  of 
the  nature  of  the  cross-language  comparisons  being  made.  Figure  1 
shows  the  ranges  found  in  the  studies  above  in  a  different  way  for 
easy  comparison. 

It  is  readily  evident  from  Table  2  and  Figure  1  that  the 
languages  of  concern  in  the  present  study  correspond  in  average  VOT 
for  some  initial  bilabial  stops  but  not  for  all  of  them.  Arabic, 
Spanish,  and  Thai  initiate  voicing  for  what  is  usually  considered  a 
/b/  phoneme  before  the  release  of  the  lip  closure  in  production  of 
the  /b/.   The  English  /b/  may  be  voiced  at  the  time  of  release  of  the 
closure  or  may  be  pre-voiced.   Korean,  Spanish,  and  Thai  begin 
voicing  a  bilabial  phoneme  soon  after  release  of  the  pressure,  close 


-42- 


Language  Time  of  onset  in 


ms 


-240-220-200-180-160-140-120-100-80-60-40-20  0  +20+40+60+80+100+120 

0 
[ ]   [_]  [ ] 

English  /bl/  0/b2/      /p/ 

0 
[ ] 

Arabic  /b/    0 

0 

[-[-]-]  [ ] 

Korean  0/b//p/   /ph / 

0 
[ ]        [  __  ] 

Spanish  /b/  0/p/ 

0 
[ ]  [___][ ] 

Thai  /b/  0/p/         1$  I 

0 

Figure  1:  Range  of  VOT  for  initial  bilabial  stops  in  isolated  words 
for  five  languages  from  Table  2. 


-43- 
to  the  release  time  for  English  /b/,  but  in  these  languages,  the 
sound  is  usually  classified  as  a  type  of  /p/. 

The  languages  above  seem  to  make  use  of  different  categories 
which  are  often  written  with  the  same  phonetic  symbol  (see  Table  4). 
The  bilingual 's  task,  then,  in  speaking  English,  is  to  accurately 
produce  not  only  /b/  and  /p/  but  also,  perhaps,  to  produce  them  with 
VOT  values  at  the  same  relative  distance  in  time  as  occurs  in 
English.   Evidence  has  been  presented  in  Chapter  1  indicating  that 
changes  are  sometimes  made.  The  fact  that  categories  shift  seems  to 
indicate  a  sensitivity  on  the  part  of  the  language  learner  to  this 
parameter  of  speech. 

It  can  be  seen  from  the  values  in  Table  2  that  a  possible  source 
of  foreign  accent  related  to  timing  is  the  use  of  LI  glottal  timing 
in  bilabial  stop  production  in  L2 .   For  example,  if  a  Thai  bilingual 
produces  a  word  such  as  "Pete"  using  the  second  category  in  the  Thai 
system  for  the  initial  plosive,  that  word  is  likely  to  sound  like 
"beat"  to  native  English  speakers  because  English  demands  a  long, 
heavy  period  of  aspiration  for  initial  voiceless  stops  and  only 
short,  light  aspiration  is  present.  That  the  second  Thai  category  is 
close  to  the  range  reserved  by  English  speakers  for  the  voiced  stop 
/b/  may  influence  the  perception  of  "beat"  by  the  English  speaking 
listener  when  the  second  Thai  category  is  used. 

The  case  of  the  two  Korean  short  lag  stops  (Table  1),  which  both 
fall  within  Lisker  and  Abramson's  (1964)  definitional  values  for 
short  lag  stops  (0-30  ms  post  release),  deserves  a  closer  examina- 
tion, because  the  specification  of  these  two  stops  bears  on  the 
present  research.   It  might  be  thought  that  these  two  stops  are  so 


-44- 
close  in  average  VOT  that  discriminating  them  would  be  difficult,  if 
not  impossible,  given  normal  variations  in  production.   While  the 
ranges  for  VOT  in  these  stops  is  narrow  in  Korean  (see  Table  2),  the 
ranges  of  production  of  VOT  seem  to  be  fairly  wide  in  other  lan- 
guages. Lisker  and  Abramson  (1964)  discuss  the  possible  problem. 
They  note  that  "the  discrimination  of  the  two  lower-valued  categories 
is  not  very  good  .  .  .  but  ...  it  may  still  well  be  the  single  most 
important  measure  for  separating  the  [two  sounds]"  (p.  403).  The 
descriptions  of  importance  to  the  present  study  are  contained  in 
their  discussion  of  the  categories  of  Korean  stops.   Lisker  and 
Abramson  (1964)  report  that  the  first  category  for  Korean  has  been 
called  "voiceless,  lax  and  slightly  aspirated"  (p.  397).  They  note 
that  other  analysts  consider  the  first  category  as  "a  case  of 
gemination  or  as  a  sequence  of  the  single  stop  phoneme  and  a  phoneme 
of  'glottal  tension'.  .  ."  (p.  397,  internal  quotation  marks  in  the 
original).   In  Korean,  then,  it  is  possible  that  subphonetic 
parameters  of  duration  or  the  addition  of  a  glottal  factor  are 
important  in  differentiating  the  Korean  phonemes.   It  may  be 
necessary  for  future  research  that  considers  the  effects  of  phonetic 
interference  from  Korean  to  English  to  determine  more  closely  the 
production  of  the  stops,  especially  the  acoustic  relevance  of 
"gemination  or  glottalization." 

Because  the  data  for  this  present  study  will  be  taken  from  words 
in  carrier  sentences,  Table  3  and  Figure  2  present  findings  from  the 
same  sources  as  in  Table  2  and  Figure  1  for  the  stops  of  concern  in 
sentences.  The  data  for  Arabic  remains  the  same  because  the  Yeni- 
Komshian  et  al.   (1977)  study  collapsed  results  for  words  and 


-45- 
sentences.  The  authors  report  that  there  were  no  major  differences 
between  the  words  in  isolation  and  in  sentences.   (One  exception  was 
the  finding  that  /bi/  in  one  word  had  a  longer  lead  in  isolation  than 
in  sentences. ) 

Lisker  and  Abramson  (1964)  also  found  that  sentence  data  were 
congruent  with  word  data  though  VOT  values  were  compressed  as  will  be 
seen  in  Table  3  and  Figure  2.  The  major  effect  of  the  additional 
context  was  that  in  noninitial  position,  voicing  sometimes  proceeded 
unbroken  from  the  preceding  voiced  phone.   Unbroken  voicing  was  found 
for  speakers  of  all  the  languages  included  here  from  the  Lisker  and 
Abramson  (1964)  study  and  is  noted  by  an  asterisk  in  the  following 
Table  3.   Flege  and  Brown  (1982)  also  found  that  English  speakers 
often  produced  unbroken  voicing.   Yeni-Komshian  et  al.  (1977)  do  not 
mention  occurrence  or  nonoccurrence  of  unbroken  voicing. 

Production  of  correct  timing  for  /p/  would  seem  to  be  the  major 
difficulty  facing  L2  speakers.  Evidence  has  been  presented  above 
that  seems  to  indicate  that  acceptable  production  of  /b/  in  sentences 
in  English  may  be  performed  with  (a)  unbroken  voicing  from  the 
preceding  voiced  phone,  (b)  voicing  lead,  or  (c)  short  lag  voicing. 
Temporal  specifications  used  in  the  LI  languages  of  concern  here  may 
be  applied  to  the  production  of  English  /b/.  However,  L2  speakers 
will  not  necessarily  apply  the  correct  timing  specifications  to 
English  /b/  even  if  the  speakers  have  a  /b/  in  the  repertoires  for 
their  LI . 


-46- 

Table  3:  Mean  VOT  and  range  in  ms  for  initial  bilabial  stops  in 
sentences  in  five  languages 


Language  VOT  Range 

Arabic      -40  -125:20 

0:10    10:20    40:130 
-90  a     0:15 
-90:-20    0:20    25:80 


Korean 

+  5 

+13* 

+75 

Spanish 

-90* 

+  4 

Thai 

-66* 

+11 

+50 

Sources:  Lisker  &  Abramson,  1964,  except  Arabic  which  is  from  Yeni- 

Komshian  et  al.  1977. 
Note:  Data  reported  by  Yeni-Komshian  et  al.  1977  combines  isolated 
words  and  sentential  productions  for  /bi/. 
*  indicates  unbroken  voicing  continuing  from  the  previous 
voiced  sound, 
(a)  only  one  example  without  unbroken  voicing  was  obtained. 


Language  Time  of  onset  in  ms 

-240-220-200-180-160-140-120-100-80-60-40-20  0  +20+40+60+80+100+120 

0 

[ 1       [-]  [ ] 

English  /bl/        0/b2/  /p/ 

0 
[ ] 

Arabic  /b/    0 

0 
[_I__]   [ ] 

Korean  0/b//p/     /ph/ 

0 

I  [ ] 

Spanish  /b/  0/p/ 

0 
[ ]    [___][ ] 

Thai  /b/      0/p/    /ph/ 

0 

Figure  2:  Range  of  VOT  for  initial  bilabial  stops  in  sentences 
for  five  languages  from  Table  3 


-47- 

Positional  allophone  usage  also  varies  between  languages  and  may 
interfere  with  choices  made  in  the  production  of  L2  sounds.   Keating 
et  al.  (1983)  compared  51  languages,  including  those  of  interest 
here,  examining  phonemes  and  positional  allophones  for  these 
languages.  These  findings  are  summarized  in  Table  4. 

It  is  readily  obvious  from  Table  4  that  Arabic  lacks  /p/,  Korean 
lacks  initial  and  final  /b/  and  /d/,  Spanish  has  no  final  stops,  and 
positional  allophones  vary  greatly.   One  difference  between  the  data 
reported  in  Figure  1  and  Figure  2  and  the  data  reported  in  Table  4 
should  be  noted.   For  Korean,  Lisker  and  Abramson  (1964)  use  /b  p  ph / 
phonemes.  Keating  et  al.  (1983)  label  these  phonemes  as  /p  ph  P/ 
with  /b/  considered  to  be  a  positional  allophone  in  medial  position. 
The  labeling  difference  may  be  due  to  theoretical  considerations  not 
made  clear  or  the  difference  may  simply  be  due  to  the  fact  that  all 
three  phonemes  are  produced  with  positive  VOT.   For  the  present 
purpose,  labeling  is  less  important  than  VOT. 

The  L2  speaker  might  experience  problems  and  thus  evidence  an 
accent  due  to  incorrect  positional  allophone  use.  These  positional 
allophones  vary  in  onset  of  glottal  pulsing,  and  the  same  IPA  symbol 
may  be  used  to  refer  to  different  phones.   For  example,  although 
Spanish  has  stop  phonemes,  both  voiced  and  voiceless  stops  become 
fricatives  in  final  position  and  the  voiced  stops  become  fricatives 
in  medial  position.   For  the  word  "bead"  in  a  carrier  sentence,  then, 
Spanish  speakers  are  likely  to  substitute  frication  for  stopping. 
Superior  bilinguals,  however,  could  be  expected  to  use  stops  in  final 
and  medial  position. 


-48- 


TABLE  4:  Phonemes  and  positional  allophones;  bilabial  and  alveolar 


sto 

ps  for  five 

langua; 

?es. 

Language 

t 

Phoneme 

Initial 

Medial 

Final 

English 

P  b 

ph  b 

b 

Q 

ph  p 

b 

b 

o 

p  ph  p*  b  b  & 

0 

Arabic 

b 

b 

b 

b 

Korean 

p  ph  P 

P  Ph 

P 

b  ph 

P 

Q 

P 

Spanish 

P  b 

P   b 

P  P 

P 

Thai 

p  ph  b 

P  Ph 

b 

P  Ph 

b 

b 

0 

English 

t  d 

th  d 

d 

0 

th  d 

0 

(1 

c 

t  tht*  d  d  d° 

0 

Arabic 

t  d 

t  d 

t  d 

t  d 

Korean 

t  th  T 

t  th 

T 

d  th 

T 

t 

Spanish 

t  d 

t  d 

t  5 

5 

Thai 

t  th  d 

t  th 

d 

t  th 

(1 

cl 

Q 

Source: 

Keat 

ing  et  al. , 

1983. 

-49- 

The  present  study  will  examine  the  voice  onset  time  of  initial 
bilabial  stop  consonants  to  see  if  superior  bilinguals  use  LI  timing, 
English  timing,  or  something  in  between.   Research  reviewed  in 
Chapter  1  indicates  that  the  VOT  of  L2  speakers  is  usually  intermedi- 
ate to  the  values  specified  by  LI  and  L2 .   For  example,  Flege  (1979) 
found  that  Arabic  speakers  usually  achieve  a  /p/  when  learning 
English,  though  it  is  a  voiceless  Arabic  /b/.   The  Arabic  speakers  in 
Flege 's  (1979)  study  were  divided  into  two  groups  and  were  compared 
with  a  group  of  native  English  speaking  Americans.   The  two  Arabic 
speaker  groups  differed  in  length  of  time  in  an  English  speaking 
environment  though  both  groups  had  had  native  English  speaking 
instructors  at  the  University  level.   One  group  had  had  more  than  two 
years  experience  in  an  English  speaking  environment  compared  with 
less  than  one  year  for  the  other  group. 

VOT  for  both  Arabic  groups  was  found,  in  the  Flege  (1979)  study, 
to  be  significantly  different  from  VOT  for  the  American  group.   The 
VOT  values  found  were  46.4  (s.d.  10)  for  the  Americans,  21.3  (s.d. 
18)  for  the  experienced  native  Arabic  speakers,  and  14.2  (s.d.  14) 
for  the  less  experienced  native  Arabic  speakers.   The  more  exper- 
ienced Arabic  groups  seemed  to  have  changed  their  VOT  but  this 
apparent  effect  of  a  difference  between  the  Arabic  groups  was  not 
statistically  significant  (Flege,  1979). 

The  findings  reviewed  above  and  in  Chapter  1  seem  to  show  that 
the  more  experienced  Arabic  bilinguals  were  attempting  to  reproduce 
the  timing  required  in  English  but  were  not  succeeding.   The  speakers 
had  acquired  a  contrast  not  present  in  their  LI  (/p-b/)  but  had  not 
acquired  the  temporal  implementation  contrast.   That  is,  they  had  not 


-50- 
acquired  the  relative  timing  of  the  onset  of  the  glottal  pulses  that 
is  used  by  native  speakers  of  American  English. 

Statistically  significant  effects  were  found  by  Flege  (1987)  in  a 
study  of  English-French  and  French-English  bilinguals.   Data  will  be 
given  in  Table  5  for  only  part  of  the  study  already  described  in 
Chapter  One. 

While  the  bilinguals  did  not  differ  significantly  from  each  other 
in  the  Flege  (1987)  study,  the  speakers  differed  significantly  in 
mean  VOT  from  both  their  LI  and  their  L2 .   Flege  and  Hillenbrand 
(1984)  would  predict  that  the  present  study  will  find  that  superior 
bilinguals  use  VOT  which  is  significantly  different  from  the  values 
specified  in  English.   (Because  LI  values  could  not  be  examined, 
except  in  general  in  the  present  study,  only  the  L2  aspect  is 
relevant. ) 

Vowel  Duration 

Vowel  duration  is  the  second  durational  parameter  to  be  examined 
in  this  study.   Vowel  duration  in  English  is  known  to  be  affected  by 
a  variety  of  factors,  which  may  include  position  in  utterance,  type 
of  vowel,  stress,  speech  rate,  context,  and  word  length  (Delattre, 
1962;  House,  1961;  Umeda,  1975;  Port,  1981).  Research  in  English  has 
largely  focused  on  the  shortening  or  lengthening  effects  of  various 
factors  in  additive  models.   Across  contexts  and  effects,  however, 
/i/  is  generally  accepted  to  be  longer  than  /I/.   In  addition,  there 
is  some  evidence  that  a  language  may  have  its  own  general  language- 
dependent  duration  requirements  for  these  vowels  (Bush,  1967; 


-51- 

Table  5:  Mean  ft/   VOT  in  ms  from  Flege  (1987) 


Monolingual   Monolingual      Bilingual  Bilingual 

English        French  French  English 

in  English    in  French   in  French/in  English  in  English/in  French 


77  33         47  /    49        56     /   43 


Source:  Flege,  1987. 


-52- 

Barry,  1974;  Chen,  1970).  This  general  duration  requirement  may 
differ  from  language  to  language  and  may  be  confounded  by  the  fact 
that  some  languages  have  phonemic  vowel  length.   Therefore,  a 
language  may  use,  for  example,  short  vowels  relative  to  vowels  found 
in  English  but  may  have  a  phonemically  long  and  short  version  of  each 
vowel,  both  versions  of  which  are  shorter  than  a  corresponding 
English  vowel. 

Factors  affecting  vowel  duration  in  languages  other  than  English 
are  not  available.   Even  for  English,  research  paradigms  differ  and 
it  is  difficult  to  predict  vowel  length  outside  of  context  based  on 
the  research.   In  some  cases,  for  example,  some  researchers  include 
the  burst  of  the  preceding  stop  consonant  in  the  measurement  of  vowel 
duration  while  other  researchers  do  not.   Because  vowel  length  is 
dependent  on  these  factors,  no  ideal  durations  will  be  given  here. 
Sets  of  vowels  across  languages  are  even  more  difficult  to  compare 
(Delattre,  1962)  because  such  factors  as  phonotactic  rules  differ 
across  languages. 

The  present  study  will  be  concerned  with  the  /i-I/  length 
difference.   Table  6  presents  some  findings  with  regard  to  English 
and  the  other  languages  of  concern  here. 

A.  English  has  phonemic  /i/  and  /I/.   Vowel  length  is  not 
phonemic. 

B.  Arabic  has  phonemic  /i/.   Vowel  length  is  phonemic  (Mitleb, 
1984a). 

C.  Korean  has  /i/  and  /I/  allophones.   (/I/  does  not  occur 
before  voiceless  consonants.)  Vowel  length  is  phonemic  (Jung,  1962). 


-53- 


TABLE  6 :  Vowel  phonemes  in  five  languages 

Language   Phonemes   Allophones   Phonemic  Length?     Source 


English  /i/   JT/~ 

Arabic  /ii/  /i/ 

Korean  /ii/  /i/       /I/ 

Spanish  /i/ 

Thai  /ii/  /i/        /I/ 


No 

Yes 

Yes 

No 

Yes 


Mitleb,  1984a 

Jung,  1962 

White,  1979 

Linanada,  1964 


Note:  /ii/  indicates  a  phonemically  long  vowel. 


-54- 

D.  Spanish  has  phonemic  /i/.  Vowel  length  is  not  phonemic 
(White,  1979). 

E.  Thai  has  /i/  and  /I/  allophones.   Vowel  length  is  phonemic 
(Linananda,  1964;  Abramson,  1962). 

For  speakers  of  Arabic,  Korean,  and  Thai,  phonemic  vowel  length 
means  that  the  speakers  are  accustomed  to  differentiating  words  based 
on  duration  of  vowels.  Research  indicates  that  the  ratio  between  the 
long  and  short  vowel  is  around  .5  (Lehiste,  1970;  Flege,  1979; 
Mitleb,  1984a).  But  using  phonemic  vowel  length  to  differentiate 
words  is  not  the  same  thing  as  using  the  general  vowel  length  and 
adjustments  for  context  required  by  English.   One  piece  of  evidence 
showing  that  this  may  be  true  comes  from  native  English  speakers 
attempting  to  learn  to  use  phonemic  vowel  length.   Jonasson  and 
McAllister  (1972)  found  that  Americans  learning  Swedish  vowels  were 
unable  to  reliably  contrast  the  duration  of  phonemically  long  and 
short  versions  of  the  same  vowel  in  Swedish. 

Foreign  accent,  therefore,  may  be  due  in  some  cases  to  applying 
long/short  vowel  contrasts  (for  phonemically  long  and  short  versions 
of  the  same  vowel)  to  the  English  distinction.   A  speaker  whose 
native  language  has  phonemically  long  and  short  vowels  may  produce 
all  short  or  all  long  vowels  when  speaking  English.   Alternatively,  a 
speaker  may  categorize,  for  example,  /i/  with  the  long  vowels  and  /I/ 
with  the  short  vowels.   Impressionistically ,  students  in  some  stages 
of  learning  English  as  a  second  language,  whose  native  languages  have 
phonemic  vowel  length,  seem  to  be  attempting  to  categorize  /i/  and 
/I/  as  long  and  short  versions  of  one  vowel. 


-55- 

Korean  and  Thai  have  allophonic  variations  of  /i/  which  are 
realized  as  /I/  (Jung,  1962;  Abramson,  1962).  Cefola  (1981)  reports 
that  native  speakers  of  Thai  have  little  difficulty  with  English  /I/ 
although  durational  specifications  are  not  mentioned  by  the  author. 
For  native  speakers  of  Korean,  however,  the  /I/  allophone  does  not 
occur  before  voiceless  consonants  and  may  be  expected  to  be  a 
problem  both  in  realization  of  the  phone  as  well  as  in  durational 
specification  (Jung,  1962)  although  superior  bilinguals  will  have 
mastered  the  positional  use  of  /I/. 

If  speakers  of  English  as  a  second  language  apply  any  of  the  LI 
factors  affecting  vowel  duration  requirements  to  English,  a  possible 
source  of  foreign  accent  exists.   Flege  (1979)  notes  that  "correct 
specification  of  vowel  duration  appears  to  be  a  major  learning 
difficulty  for  Arabs  speaking  English"  (p.  80). 

Few  data  are  available  concerning  general  vowel  duration  for 
nonnative  speakers  of  English  in  their  L2 .   Bush  (1967)  measured 
vowel  duration  in  VCVC  nonsense  syllables  for  native  speakers  of 
American  English,  British  English,  and  for  speakers  of  Indian  English 
as  a  second  language.   Stressed  vowel  duration  requirements  seemed, 
in  this  study,  to  differ  even  between  the  British  and  American 
English  dialects  as  well  as  among  American,  British,  and  Indian 
English,  although  Bush  reported  that  the  consonant/vowel  ratios  were 
more  important  in  segmenting  the  groups  of  speakers. 

Barry  (1974),  in  a  comparison  of  tense  and  lax  vowel  production 
for  two  dialects  of  German  and  one  of  English,  found  tense/lax  ratios 
which  differed  both  between  languages  and  between  two  dialects  of  one 


-56- 
language.  Barry  (1974)  found  that  the  ratio  of  /i-I/  was  .95  for 
English,  .86  for  Dialect  1,  and  .65  for  Dialect  2. 

Mitleb  (1984a)  compared  vowel  duration  in  Arabic  and  English. 
Both  sets  of  speakers  produced  a  significant  difference  in  tense/lax 
vowel  ratio.   In  the  Mitleb  study,  native  English  tense/lax  ratio  was 
84%  while  the  tense/lax  ratio  in  native  Arabic  was  65%.   In  speaking 
English,  however,  native  speakers  of  Arabic  applied  the  Arabic  ratio 
and  produced  a  67%  difference  between  tense  and  lax  vowels  (Mitleb, 
1981). 

For  all  nonnative  speakers  in  the  present  study,  the  native 
language  uses  an  /i/  vowel.   In  speaking  English,  this  /i/  will  need 
to  be  produced  according  to  the  duration  requirements  of  English. 
The  /I/  vowel  is  available  as  an  allophone  to  Korean  and  Thai 
speakers  although  not  to  native  speakers  of  Arabic  and  Spanish.   All 
of  the  nonnative  speakers  will  need  to  produce  the  /I/  in  accordance 
with  English  duration  parameters. 

The  present  study  will  examine  durations  of  /i/  and  /I/  as  well 
as  the  ratio  of  the  tense  vowel  to  the  lax  vowel  to  see  if  superior 
bilinguals  use  the  timing  required  by  their  L2 ,  English.   Flege  and 
Hillenbrand  (1984)  predict  that  the  durations  and  ratios  of  even 
superior  bilinguals  will  be  significantly  different  from  the 
durations  and  ratios  produced  by  native  speakers  of  American  English. 

Final  Consonant  Closure 

The  duration  of  the  closure  for  the  final  alveolar  stop  consonant 
is  the  third  durational  parameter  to  be  examined  in  this  study.   In 


-57- 

English,  the  closure  duration  of  post  stressed  /t/  is  generally 
longer  than  the  closure  duration  of  post  stressed  /d/  (Lisker,  1957; 
Port,  1981)  at  least  in  careful  speech.   In  less  careful  speech  or  in 
intervocalic  position  in  connected  speech,  the  ratio  of  /t/  to  /d/ 
may  increase  (Klatt,  1976;  Umeda,  1977).  For  example,  Sharf  (1962), 
using  intervocalic  stops  in  bisyllabic  words  produced  by  one  speaker, 
found  that  the  average  /t/  and  /d/  showed  a  40  ms  difference  with  /d/ 
being  the  longer  instead  of  the  shorter  consonant  in  this  study. 

One  further  confounding  factor  in  the  description  of  word  final 
but  not  utterance  final  alveolar  stops  in  English  is  the  number  of 
release  options  available  in  American  English.   There  are  at  least 
three  options. 

One  option  is  to  release  the  final  stop  normally.   This  option  is 
likely  to  be  used,  in  English,  most  often  when  the  word  in  which  the 
sound  occurs  is  in  citation  form  or  in  careful  speech.   Another 
option  is  not  to  release  the  consonant  until  the  onset  of  the 
following  phone.   Choosing  this  option  could  result  in  overly  long 
closure  durations. 

A  third  option  for  Americans  is  flapping  the  /t/  or  the  /d/. 
Flapping  is  generally  defined  as  a  /t/  or  /d/  with  a  closure  duration 
ranging  from  10-40  ms  (Zue  &  Laferriere,  1979).   Flapping  may  apply 
intervocalically  across  word  boundaries  in  American  English.  Luce 
and  Charles-Luce  (1985)  found,  in  general,  that  the  difference  in 
closure  duration  failed  to  distinguish  /t/  from  /d/  in  connected 
speech  in  English.   The  Americans  in  Flege's  (1979)  study  showed  a 
significant  difference  in  /t-d/  ratio  but  the  author  felt  it  was 


-58- 
because  more  /d/s  were  flapped  than  /t/s.  Reduction  in  closure 
duration  also  may  become  apparent  when  rate  increases  (Port,  1981). 

Table  4  presented  data  showing  that  only  English  and  Arabic  have 
a  word  final  voiced  and  voiceless  stop  consonant  contrast  (though 
Arabic  lacks  a  contrast  for  /b/)  and  only  these  languages,  therefore, 
should  exhibit  differences  in  closure  duration  in  final  position. 
Data  from  research  on  Arabic  (Port  et  al.,  1980;  Port  &  Flege,  1981), 
however,  showed  no  difference  in  the  closure  interval  of  word  final 
voiced  and  voiceless  stops,  although  a  difference  in  closure  interval 
was  found  in  word  initial  stops.   Arabic  seems  to  rely  on  the 
presence  or  absence  of  voicing  during  the  closure  to  distinguish  /t/ 
and  /d/  (Port  &  Mitleb,  1983).  Spanish,  Korean,  and  Thai  do  not  have 
final  voiced  and  voiceless  stop  distinctions. 

Prediction  of  nonnative  speaker  final  consonant  closure  duration 
is  complicated  by  the  syllable  structure  and  allophonic  variation 
rules  used  by  the  languages  of  concern  here.  Thai,  for  example,  does 
not  permit  syllable  final  /d/  (Cefola,  1981),  instead  requiring  a 
/t/.  Keating  et  al.  (1983),  as  shown  in  Table  4,  reported  this 
alveolar  stop  in  final  position  as  a  voiceless  /d/.  Labeling  aside, 
only  one  phone  is  permitted  in  syllable  or  word  final  position. 

In  Spanish,  intervocalic  stops  become  fricatives.  Korean,  on  the 
other  hand,  seems  to  permit  intervocalic  voiced  and  voiceless 
alveolar  stops  (Chen,  1970),  although  it  does  not  permit  final  voiced 
stops. 

Some  data  are  available  concerning  final  consonant  closure 
duration  by  native  Arabic  speakers  in  English.   Fourakis  and  Iverson 
(1985)  found  that  native  speakers  of  Arabic  produced  longer  closures 


-59- 

in  English  than  did  native  speakers  of  English.   The  data  in  Flege 
(1979)  showed  that  Arabic  speakers  produced  a  significant  difference 
between  /t/  and  /d/  in  English  and  that  their  closure  durations  were 
longer  than  those  of  Americans. 

English  dialect  may  also  present  difficulties  in  this  study. 
Vowel  duration  differences  in  English  dialects  were  mentioned  in  the 
preceding  section.  Many  speakers  of  English  as  a  second  language 
learned  English  from  speakers  of  dialects  of  British  English  which 
does  not  generally  permit  flapped  alveolar  stops  at  word  boundaries. 

Flege  and  Port  (1981),  in  another  study  of  Arabic  speakers  of 
English,  found  that  the  L2  speakers  in  their  study  did  not  flap  /t/ 
or  /d/,  possibly  because  they  learned  English  from  speakers  of 
British  English.   In  the  Mitleb  (1981)  study,  however,  both  Americans 
and  the  Arabic  speakers  of  English  who  had  learned  English  from 
speakers  of  British  English  flapped  a  good  proportion  of  the  /t/  and 
/d/  productions.  A  similar  finding  was  reported  in  Port  and  Mitleb 
(1985).  Table  7  presents  this  data. 

Superior  bilingual s  could  be  expected  to  have  acquired  the 
phonological  rule  for  flapping  in  American  English  though  not  the 
durational  contrast  of  unf lapped  /t/  and  /d/. 

The  present  study  will  examine  the  closure  duration  of  /t/  and 
/d/  as  well  as  the  ratio  of  /t/  to  /d/  to  see  if  superior  bilinguals 
use  American  English  timing.   Flege  and  Hillenbrand  (1984)  predict 
that  the  duration  and  ratios  of  even  superior  bilinguals  will  be 
significantly  different  from  the  duration  and  ratios  produced  by 
native  speakers  of  American  English. 


-60- 

TABLE  7;  Proportion  of  flapped  /t/  and  /d/  by  Americans  and  Arabic 
speakers  of  English  from  two  sources 


Language  Mitleb,  1981      Port  and  Mitleb,  1985 

ft/  l&l  lx.1  /d/ 


American  English  56%      80%         69%      67% 

Arabic  English  54%      66%         59%      56% 


-61- 
Vowel/Consonant  Ratio 

The  interaction  between  the  length  of  the  vowel  and  the  voicing 
of  the  final  consonant  is  the  fourth  parameter  to  be  examined  in  this 
study.   In  English,  the  relationship  between  duration  of  a  stressed 
vowel  and  the  following  consonant  conforms  to  a  pattern  (Port,  1981; 
Klatt,  1976). 

A.  Vowels  are  longer  before  voiced  consonants  than  before 
voiceless  consonants  (Denes,  1955;  Chen,  1970;  Luce  &  Charles-Luce, 
1985). 

B.  Post  vocalic  voiceless  stops  are  longer  than  post  vocalic 
voiced  stops  (Lisker,  1957;  Port,  1981). 

In  English,  then,  there  is  usually  an  inverse  relationship 
between  vowel  duration  and  the  length  of  the  following  consonant.   If 
this  set  of  relationships  is  universal,  then  there  is  no  need  to 
specify  it  in  the  phonology  of  the  language.   If  any  of  the  relation- 
ships are  not  universal,  however,  it  or  they  must  be  included  in  the 
phonology.   Chapter  1  discussed  the  difficulty  of  including  nonseg- 
mental  factors  in  a  segment  based  model. 

The  preceding  section  reviewed  some  final  consonants  for  the 
languages  of  concern  in  the  present  study.   Only  English  and  Arabic 
have  a  word  final  voiced  and  voiceless  stop  consonant  contrast. 
Table  8  summarizes  the  published  data  on  the  vowel/consonant  ratio 
which  indicate  that  the  vowel  duration  difference  related  to 
consonant  voicing  may  be  important  in  medial  position  in  Korean  and 
Spanish.   Word  samples  for  the  present  study  will  be  taken  from 


-62- 


TABLE  8:  Vowel  duration  before  voiced  and  voiceless  consonant 


Language 

Duration 
difference? 

Ratio 
of  diff. 

Mean  Diff 
in  ras 

erence 

Source 

English 

yes 

.61 

138 

Chen, 

1970 

Arabic 

no 

— 



Mitleb, 

1984a 

Korean 

yes 

.78 

15 

Chen, 

1970 

Spanish 

yes 

.86 

18 

Zimmerman 

&  Sapon, 

1958 

Thai 

no 

— 

— 

Cefola, 

1981 

-63- 
monosyllabic  words  in  a  carrier  sentence.   Word  final  consonants  of 
interest  here  are,  therefore,  not  utterance  final. 

Thai  does  not  permit  syllable  or  word  final  voiced  consonants. 
For  this  reason,  Table  8  shows  Thai  as  having  no  vowel  duration 
difference  before  voiced  and  voiceless  consonants.   On  the  other 
hand,  Korean,  while  having  no  final  voiced  consonants,  seems  to  show 
a  voicing  effect  in  medial  intervocalic  position  (Chen,  1970). 
Spanish  also  presents  a  problem  because  final  or  intervocalic  stops 
become  fricatives.   It  is  difficult  to  separate  the  effects  of 
lengthening  or  shortening  produced  by  following  frication  from  the 
voicing  effect,  which  seems  very  small  in  any  case  (Delattre,  1962). 
Arabic,  as  examined  in  Mitleb  (1984b),  seems  to  be  a  clear  case  of  no 
effect  on  vowel  duration  related  to  the  voicing  of  the  following 
consonant. 

When  speaking  English,  native  Arabic  speakers  do  not  produce  the 
voicing  effect  required  by  English.   Over  all  the  vowels  tested  in 
Mitleb  (1981),  the  ratio  of  the  voicing  effect  was  .91,  compared  to 
.80  for  the  English  speakers.   Table  9  shows  the  results  obtained  by 
Mitleb  (1981)  in  a  comparison  of  native  English  and  Arabic  accented 
English  for  the  vowels  of  interest  in  the  present  study. 

For  native  Thai  speakers,  Kruatrachue  (1960)  reported  that  the 
speaker  of  English  as  a  second  language  apply  the  phonemic  vowel 
length  contrast  to  the  voicing  requirements.   Thai  L2  speakers  might, 
therefore,  be  expected  to  show  durational  differences  before  voiced 
and  voiceless  consonants  in  English  but  that  difference  would  be 
related  to  the  difference  in  length  between  the  long  and  short 
versions  of  a  vowel  in  Thai. 


-64- 


(s.d.  in  parentheses 

) 

~^„„     .,_,.„„,    l.J_l__l.^L,  , 

Vowel 

American  English 
before  /t/  before  /d/ 

Ratio 

Arabic  English 
before  /t/  before  /d/ 

Ratio 

HI 
HI 

147       189 
(30)      (38) 
132       163 
(29)      (27) 

.79 

113        132 

(25)        (31) 

78         89 

(17)        (23) 

.86 

-65- 

The  present  study  will  examine  the  vowel/consonant  ratio  to  see 
if  superior  bilinguals  use  American  English  timing.   Flege  and 
Hillenbrand  (1984)  predict  that  the  ratios  of  even  superior  bilinguals 
will  be  significantly  different  from  the  ratios  produced  by  native 
speakers  of  American  English. 

Summary 

This  section  has  attempted  to  define  parameters  for  testing  the 
claim  that  even  superior  bilinguals  will  not  match  the  temporal 
implementation  required  for  English  sounds.   Absolute  parameters  do 
not  exist,  given  the  many  factors  which  may  influence  duration. 
However,  general  tendencies  have  been  reviewed  in  this  chapter  and 
these  tendencies  may  be  compared  with  the  findings  of  the  present 
research. 

Data  are  also  not  available  for  many  of  the  durations  to  be 
measured  in  this  study  for  the  non-English  languages  of  concern  here. 
Where  those  data  are  available,  attempts  will  be  made  to  relate  the 
results  of  the  present  study  to  LI  and  L2 . 


CHAPTER  3 
METHODS 

Introduction 


The  purpose  of  the  present  study  is  to  investigate  the  claim  (Flege 
&   Hillenbrand,  1984)  that  even  superior  bilinguals  will  differ  from 
native  American  English  speakers  in  subphonemic  timing  as  revealed  in 
acoustic  measurements.   Four  subphonemic  duration  parameters  were  chosen 
for  examination.   These  parameters  were  VOT,  vowel  duration,  final  stop 
consonant  duration  and  the  interrelationship  between  the  length  of  the 
vowel  and  the  voicing  of  the  final  stop  consonant.   This  chapter  will 
describe  the  procedures  used  in  the  selection,  collection  and  analysis 
of  the  data  which  form  the  basis  of  the  present  study. 

Subjects 

Four  groups  of  six  speakers  each  participated  in  the  study.   The 
American  group  consisted  of  four  females  and  two  males.   The  four 
females  were  speech  pathology  and  audiology  students  at  The  George 
Washington  University.   One  male  was  a  professor  in  the  Speech  and 
Hearing  Department,  and  one  male  was  a  graduate  student  in  international 
law.   The  Americans  were  all  between  the  ages  of  22  and  38  (mean  age  = 
28  years).  None  of  the  Americans  had  any  reported  history  of  speech  or 
hearing  impairment. 


-66- 


-67- 
In  addition  to  English,  four  languages  were  chosen  from  which  to 
select  speakers  for  this  study.   The  non-English  languages  chosen  were 
Arabic,  Korean,  Spanish,  and  Thai.   These  languages  were  selected 
because  published  data  were  available  for  at  least  some  of  the  timing 
factors  to  be  studied  for  the  native  languages.   The  available  data 
indicated  that  the  languages  differ  from  English  in  different  ways. 
Therefore,  the  adjustments  made  by  superior  speakers  for  timing  in 
producing  English  will  need  to  be  of  different  kinds.   Chapter  2 
reviewed  published  data  on  the  timing  factors  that  will  be  included  in 
this  study. 

Nonnative  Subject  Selection 

Nonnative  subjects  were  chosen  in  the  primary  selection  because 
they  seemed  to  be  good  to  excellent  speakers  of  English  as  a  second 
language.   The  present  researcher  has  taught  pronunciation  of  English 
for  foreign  students  for  six  years  and  made  the  preliminary  selection 
based  on  experience  with  foreign  accent.   Subjects  were  asked  to  recom- 
mend others  in  their  ethnic  community  who  were  excellent  speakers.   If 
these  people  met  the  criteria  for  selection,  they  were  included  in  the 
preliminary  set  of  subjects.   Twenty-four  subjects  were  eventually 
chosen  for  this  study,  including  six  speakers  each  from  Arabic,  Korean, 
Spanish,  and  Thai.  The  nonnative  subjects  were  all  17  years  of  age  or 
older  and  had  no  reported  speech  or  hearing  limitations  aside  from 
accent.  None  of  the  speakers  reported  any  difficulty  in  being  under- 
stood in  English. 


-68- 

The  Arabic  group  consisted  of  four  males  and  two  females  ranging  in 
age  from  18  to  42  (mean  age  =  25  years).  One  female  subject  was  a 
professor  of  psychology  while  the  rest  were  undergraduate  students.  The 
Arabic  speakers  were  all  born  in  different  Arabic  speaking  countries. 

The  Korean  group  consisted  of  five  females  and  one  male  ranging  in 
age  from  17  to  25  (mean  age  =  20  years).  All  were  undergraduate  stu- 
dents born  in  Korea. 

The  Spanish  group  consisted  of  five  males  and  one  female  ranging  in 
age  from  19  to  29  (mean  age  =  23  years).  All  were  undergraduate  stu- 
dents. The  Spanish  speakers  were  born  in  five  different  Spanish  speak- 
ing countries. 

The  Thai  group  consisted  of  four  females  and  two  males  ranging  in 
age  from  21  to  41  (mean  age  =  31  years).   Three  of  Thai  speakers  were 
undergraduate  students  while  three  worked  at  the  Royal  Thai  Embassy. 
All  of  the  Thai  speakers  were  born  in  Thailand. 

Table  10  provides  summary  data  on  the  nonnative  subjects.   A  mean 
rating  of  accent  strength  for  each  subject  is  also  included  and  will  be 
discussed  in  a  separate  section.   Rating  was  performed  on  a  three  point 
scale.  A  rating  of  1  indicated  that  the  speaker  was  perceived  to  be  a 
native  speaker  of  American  English.   A  rating  of  2  indicated  a  slight 
perceived  accent  while  a  rating  of  3  indicated  a  medium  to  heavy  per- 
ceived accent.  The  figures  given  in  Table  11  are  mean  ratings  by  21 
judges.  These  ratings  were  used  to  place  speakers  into  groups  in  order 
to  compare  the  productions  of  native  speakers  of  American  English  with 
nonnative  productions. 


-69- 


Table  10:  Nonnative  subject  data  and  accent  rating 


Country 

Years 

Age  at  first 

Mean 

of 

of  U.S. 

Exposure  to 

Accent 

s§ 

Birth 

Age 

Sex 

Residence 

Native  English 

Rating 

3 

Columbia 

20 

M 

10 

10 

1.00000 

4 

El  Salvador 

22 

M 

1 

1() 

2.00000 

5 

Nicaragua 

22 

M 

12 

10 

1.09524 

6 

Guatemala 

19 

F 

0.5 

13 

2.09524 

7 

Columbia 

23 

M 

10 

13 

1.04762 

8 

Venezuela 

29 

M 

3.5 

25 

2.95238 

9 

Korea 

19 

M 

8 

11 

2.19048 

10 

Korea 

17 

F 

7 

10 

2.00000 

11 

Korea 

25 

F 

12 

13 

2.42857 

12 

Korea 

20 

F 

9 

11 

1.38095 

13 

Korea 

20 

F 

9 

11 

2.14286 

14 

Korea 

20 

F 

6.5 

14.5 

2.28571 

15 

Thailand 

21 

F 

4.5 

6 

2.42857 

16 

Thailand 

41 

F 

14 

16 

2.57143 

17 

Thailand 

34 

M 

2 

18 

2.61905 

18 

Thailand 

39 

F 

14 

10 

2.90476 

19 

Thailand 

22 

M 

1 

16 

3.00000 

20 

Thailand 

30 

F 

2 

10 

1.52381 

22 

Bahrain 

20 

M 

2.5 

16 

2.42857 

23 

Egypt 

42 

F 

22 

8 

3.00000 

24 

Jordan 

22 

M 

2 

20 

2.57143 

25 

Palestine 

27 

M 

5.5 

8 

1.61905 

2  b 

Lebanon 

20 

M 

3 

5 

2.23810 

32 

Syria 

18 

F 

2 

5 

1.33333 

-70- 
The  Age  Factor  in  Nonnative  Subject  Selection 

The  present  study  is  not  concerned  with  predicting  second  language 
pronunciation  acquisition  and  the  role  various  factors  play  in  that 
acquisition.   The  present  purpose  is  to  explore  an  aspect  of  a  model 
that  predicts  that  bilinguals  will  not  be  able  to  acquire  the  details  of 
timing.   Flege  &  Hillenbrand  (1984)  do  not  specify  age  as  a  factor  in 
their  prediction.  However,  Flege  (1986)  reviewed  the  literature  con- 
cerning the  influence  of  age  of  first  exposure  on  authenticity  of 
accent.   He  concluded  that  "the  age  at  which  L2  learning  commences  will 
significantly  affect  L2  pronunciation,  at  least  for  children  and  adoles- 
cents learning  an  L2  naturalistically"  (p.  22).   Until  recently,  few 
studies  of  acquisition  by  adults  have  taken  into  account  the  age  of 
first  exposure,  and  we  are  far  from  certain  about  when  the  end  of  a 
favored  acquisition  period,  a  "critical  period"  for  authentic  pronuncia- 
tion, might  come  or  if  indeed  such  a  critical  period  or  sensitive  period 
during  which  learning  is  facilitated  exists.   None  of  the  nonnative 
subjects  in  this  study  learned  English  naturalistically  on  first  expo- 
sure.  All  began  their  study  of  English  in  school. 

The  argument  has  been  made  that  children  acquiring  a  second  language 
below  some  age  should  have  a  biological  or  neurological  advantage  over 
older  learners,  and  the  argument  is  usually  related  to  brain  maturation. 
However,  at  this  point  in  time,  there  are  no  final  data  to  suggest  that 
even  children  can  acquire  native  speaker  pronunciation. 

In  one  relevant  study,  Asher  &  Garcia  (1969)  looked  at  children's 
acquisition  of  pronunciation  to  see  if  age  of  first  exposure  made  a 
difference.   Actually,  the  study  looked  at  age  of  arrival  in  the  U.S. 


-71- 
for  Cuban  Spanish  speaking  children  rather  than  age  of  first  exposure 
which  is  not  necessarily  the  same  thing.   Even  children  born  into  a 
bilingual  community  in  the  U.S.  need  not  hear  English  very  early  in 
life,  especially  in  a  segregated  bilingual  community.   In  any  case, 
Asher  &  Garcia  (1969)  found  that  not  one  of  the  71  bilingual  children 
was  judged  to  have  a  native  American  English  accent  by  a  panel  of  native 
English  speaking  high  school  students,  though  many  of  the  speakers  were 
judged  to  have  "near-native"  accents. 

Flege  (1987),  in  a  study  discussed  earlier,  provided  evidence  that 
adults  could  learn  new  phones  and  could  modify  previously  used  phones  to 
approach  the  norms  for  that  phone  in  their  L2.   It  does  not  seem  to  be 
the  case  that  children  are  clearly  able  to  learn  whatever  pronunciation 
norms  that  exist  in  the  L2  while  adults  are  clearly  not  able  to  do  so. 
Both  children  and  adults  seem  to  be  capable  of  learning,  but  neither 
group  may  achieve  authentic  pronunciation. 

Table  10  provides  the  years  of  U.S.  residence  as  well  as  the  age  of 
first  exposure  to  native  English  speakers  in  school.   If  the  claim  being 
tested  in  this  study  is  correct,  no  bilinguals  will  acquire  an  authentic 
native  accent  no  matter  how  early  they  begin  learning  the  language. 

Judgements  of  Accent  Strength 

All  subjects,  American  and  nonnative,  were  rated  by  native  speakers 
of  American  English  for  strength  of  accent.   The  listeners  were  24 
students  from  an  undergraduate  introductory  phonetics  class  who  received 
extra  credit  in  the  course  for  their  participation.   The  rankings  made 
by  three  listeners  who  ranked  native  English  speakers  as  having  an 


-72- 

accent  were  deleted  from  the  study.  Mean  ratings  by  21  listeners  were 
used  to  produce  the  final  rating  scores. 

In  order  to  prepare  tapes  for  listener  rating,  samples  to  be  used  on 
the  tapes  were  taken  from  the  subjects'  readings  of  the  Rainbow  Passage 
(Fairbanks,  1960).  The  phrase  "his  friends  say  he  is  looking  for  the 
pot  of  gold  at  the  end  of  the  rainbow"  was  taken  for  each  subject  from 
the  reading  of  the  Rainbow  Passage.  A  phrase  from  the  Rainbow  Passage 
was  chosen  for  rating  because  it  was  felt  that  the  judgement  made  by  the 
raters  should  be  based  on  overall  speaking  ability.  Since  it  is  not 
possible  to  control  for  variability  related  to  judgements  based  on 
content  rather  than  on  accent  when  using  spontaneous  speech  samples  for 
rating  judgements,  the  phrase  from  the  Rainbow  Passage  was  used.   This 
particular  phrase  was  selected  because  it  is  heavily  loaded  with  stop 
consonants . 

A  set  of  five  tapes  was  prepared  for  rating.   Each  subject  was 
introduced  by  subject  number  on  each  tape.   Two  seconds  of  silence  were 
used  between  subject  number  and  speaker.  Five  seconds  of  silence  were 
given  for  judging.   Tape  1  contained  a  randomly  selected  15  of  the  30 
speakers  arranged  in  random  order.   This  tape  was  used  to  familiarize 
the  listeners  with  the  types  of  accent  to  be  heard  and  to  allow  the 
listeners  to  fix  the  boundaries  for  the  ratings  to  be  made.  Tapes  2,3, 
4,  and  5  consisted  of  eight  speakers,  including  six  speakers  from  one 
language  and  two  of  the  native  American  English  speakers,  arranged  in 
random  order.   Random  order  for  all  tapes  was  achieved  by  using  the 
occurrence  of  the  subject  number  in  a  random  number  table.   The  judges 
heard  the  five  tapes  in  two  groups.   Each  group  heard  Tape  1  first. 


-73- 

Group  1  heard  the  tapes  in  the  order  2-3-4-5,  while  Group  2  heard  the 
tapes  in  the  order  3-2-5-4. 

Rating  procedure  was  the  same  for  each  group.  Prior  to  the  presen- 
tation of  the  tapes,  the  listeners  were  read  the  instructions  and  famil- 
iarized with  the  rating  scale  (see  Appendix  A).   Listeners  were  asked 
to  rate  the  speakers  on  a  three  point  scale.   A  rating  of  1  was  given  if 
the  speaker  was  perceived  to  be  a  native  American  English  speaker.  A 
rating  of  2  indicated  that  the  judge  heard  a  slight  accent  and  a  rating 
of  3  indicated  a  perceived  medium  to  heavy  accent.   Listeners  heard  the 
tapes  in  a  classroom  setting.   The  recordings  were  played  back  over  a 
Sanyo  (Model  RDS40)  stereo  cassette  recorder  and  a  pair  of  high  quality 
loudspeakers.   During  the  familiarization  tape  (Tape  1),  a  check  was 
made  to  see  that  the  rating  time  between  subjects  was  adequate. 

Results  of  Rating  and  Discussion 

Raw  scores  for  ratings  were  analyzed  through  a  CMS  system  computer 
at  The  George  Washington  University.   A  listing  of  the  ratings  and  mean 
ratings  were  obtained  for  each  subject  using  a  SAS  (version  5.16)  pro- 
gram.  The  scores  for  three  of  the  judges  who  rated  the  two  American 
English  males  as  having  a  slight  accent  (2)  were  deleted  and  the  mean 
ratings  were  obtained  again. 

Because  the  speakers  were  rated  in  2  groups,  a  two-way  analysis  of 
variance  (SASBMDP4V-1983)  was  performed  to  test  the  effects  of  language, 
rating  group,  and  language  by  rating  group.  There  was  no  significant 
statistical  difference  between  languages  for  the  rating.  However,  rating 
group  was  found  to  be  statistically  significant  (_p_  <.03),  as  was  rating 
group  by  language  (£  <.002).   Inspection  of  the  means  for  the  languages 


-74- 
and  groups  indicated  that  the  first  group  of  raters  tended  to  rate 
Arabic,  Korean,  and  Spanish  speakers  a  little  higher  than  the  second 
group  of  raters  but  rated  Thai  speakers  a  little  lower  than  the  second 
group  of  raters.   Mean  ratings  for  each  speaker  for  each  group  were 
reviewed  and  the  ratings  were  found  to  be  close  enough  so  that  the  same 
3  groups  of  nonnative  speakers  would  be  formed  using  mean  ratings  from 
the  first  raters,  the  second  raters,  or  both  sets  of  raters  combined. 

Reliability  of  raters  was  also  determined  using  the  interclass 
correlation  statistic  (Winer,  1971,  p.  283).  The  correlation  coeffi- 
cient produced  by  this  procedure  was  0.97. 

The  mean  ratings  were  used  to  group  the  speakers  into  four  catego- 
ries, which  are  listed  on  Table  11.   Group  1  (English)  contained  the  six 
native  American  English  speakers.   Group  2  (Superior)  contained  subjects 
whose  mean  rating  was  less  than  2.   This  group  consisted  of  seven  speak- 
ers, including  3  Spanish  speakers,  2  Arabic  speakers,  1  Korean  speaker, 
and  1  Thai  speaker.   One  of  the  Spanish  speakers  had  been  consistently 
rated  1  (English)  while  the  rest  had  a  few  2  (Superior)  ratings.   The 
speakers  in  this  group  were  judged  to  be  native  speakers  of  English  a 
minimum  of  8  times  and  a  maximum  of  21  times  out  of  21. 

Group  3  (Very  Good)  consisted  of  4  Korean  speakers,  2  Spanish  speak- 
ers, 1  Arabic  speaker,  and  0  Thai  speakers.   Their  mean  rating  fell 
between  2  and  2.24.   Group  4  (Accented)  consisted  of  10  speakers,  only 
one  of  whom  was  ever  (once)  judged  to  be  a  native  speaker.  Their  mean 
rating  was  between  2.43  and  3.   The  Accented  group  consisted  of  5  Thai 
speakers,  3  Arabic  speakers,  1  Spanish  speaker,  and  1  Korean  speaker. 
Table  12  displays  a  summary  of  the  rating  results  with  the  new  groupings 
formed  as  described  above.  These  new  grouping  were  used  in  later 


-75- 


Table  11;  Mean  strength  of  accent  ratings  and  groupings 


Accent 
Group  S#   Language   Rating 


#  of  1    #  of  2    #  of  3 
Ratings   Ratings   Ratings 


2 

3 

Spanish 

1.00000 

21 

0 

0 

2 

7 

Spanish 

1.04762 

20 

1 

0 

2 

5 

Spanish 

1.09524 

19 

2 

0 

2 

32 

Arabic 

1.33333 

14 

7 

0 

2 

12 

Korean 

1.38095 

13 

8 

0 

2 

20 

Thai 

1.52381 

10 

11 

0 

2 

25 

Arabic 

1.61905 

8 

13 

0 

3 

4 

Spanish 

2.00000 

1 

19 

1 

3 

10 

Korean 

2.00000 

3 

15 

3 

3 

6 

Spanish 

2.09524 

2 

15 

4 

3 

13 

Korean 

2.14286 

3 

12 

6 

3 

9 

Arabic 

2.19048 

1 

15 

5 

3 

26 

Korean 

2.23810 

n 

16 

5 

3 

14 

Korean 

2.28571 

5 

4 

11 

4 

15 

Thai 

2.42857 

0 

12 

9 

4 

11 

Korean 

2.42857 

1 

10 

10 

4 

22 

Arabic 

2.42857 

0 

12 

9 

4 

16 

Thai 

2.57143 

() 

9 

12 

4 

24 

Arabic 

2.57143 

0 

9 

12 

4 

17 

Thai 

2.61905 

0 

8 

13 

4 

18 

Thai 

2.90476 

0 

2 

19 

4 

8 

Spanish 

2.95238 

0 

1 

20 

4 

19 

Thai 

3.00000 

0 

O 

21 

4 

23 

Arabic 

3.00000 

0 

(J 

21 

Note 

:  Group 

2  =  Superior,  Group  3  = 

=  Very  Good,  Group  < 

4  =  Accented 

-76- 
statistical  analyses  that  compared  the  two  main  groups  of  interest  in 
this  study,  the  English  and  the  Superior  groups.   The  other  groups  were 
also  included  in  some  of  the  analyses  in  order  to  see  if  the  Superior 
group  was  different  in  some  way  from  the  Very  Good  and/or  the  Accented 
group. 

One  factor  that  was  not  accounted  for  in  ratings  of  accent  was 
prejudice  for  or  against  particular  language  groups.  The  productions  of 
one  speaker  may  be  rated  lower  than  the  productions  of  another  speaker 
simply  because  the  listener  tends  to  feel,  for  example,  that  a  Thai 
accent  sounds  more  foreign  than  a  Spanish  accent.  This  prejudice  may 
hold  even  for  the  lightest  accents.  More  research  needs  to  be  done  to 
determine  the  contribution  of  language  prejudice  to  judgements  of  accent 
strength. 

Overall,  the  Thai  speakers  seem  to  have  been  rated  as  less  profi- 
cient than  any  other  group.   Language  prejudice  may  have  played  a  part, 
though  it  is  more  likely  in  this  case  that  the  subjects  were  less  profi- 
cient.  Two  other  Thai  speakers  who  had  less  accent  were  recorded  but, 
due  to  an  intermittent  short  in  the  microphone  cable  that  occurred 
during  taping,  could  not  be  included  in  the  study. 

Experimental  Procedures 
Test  Materials 

The  following  15  monosyllabic  words  were  chosen  for  this  study: 
beat   bit    big     Pete    peg   get   deed  teak 
bead   bid    Paig    paid    pick  date  did 


-77- 
One  token  of  each  word  was  written  on  3x5  cards,  and  a  total  of  15 
cards  were  randomly  ordered  for  reading  by  subjects.   Instructions 
(Appendix  B)  were  read  in  English  by  the  researcher.   A  similar  set  of 
words  on  similar  cards  was  read  first  by  the  subjects  to  accustom  them 
to  the  task.   During  the  instructional  period,  the  subjects  read  aloud 
and  practiced  the  list  of  the  test  materials  to  accustom  them  to  those 
words.  This  was  done  because  the  focus  of  the  study  was  on  accurate 
pronunciation.  The  cards  were  shuffled  again  and  the  subjects  were 
recorded  reading  the  stimulus  materials  from  the  cards. 

Recording  Procedure 

Strict  recording  procedure  did  not  vary  from  subject  to  subject. 
However,  recording  room  did  vary.   Within  the  paradigm  of  the  strictest 
experimental  phonetic  research,  subjects  will  be  recorded  only  in  spe- 
cially designed  soundproof  booths.   In  this  way,  the  highest  quality 
recordings  can  be  made.   Sociolinguistic  field  research,  on  the  other 
hand,  has  produced  important  and  clear  findings  based  on  recordings  made 
(to  give  an  extreme  example)  on  street  corners  (Labov,  1966,  Labov, 
Yeager,  &  Steiner,  1972).  Though  the  best  conditions  are  desirable,  the 
phonetic  science  paradigm  should  not  be  taken  to  mean  that  recordings 
adequate  to  the  purpose  cannot  be  made  in  less  than  ideal  conditions.  In 
the  present  study,  quiet  conditions  were  arranged  for  each  recording. 
Sample  spectrograms  were  made  from  recording  made  in  the  different 
situations  and  were  found  to  be  comparable.   One  set  of  recordings  was 
made  in  a  soundproof  booth  in  order  to  compare  quality  of  recording  with 
those  made  in  less  perfect  conditions.   The  spectrograms  were  compara- 
ble. 


-78- 

Recording  procedure  was  the  same  for  each  subject.  Subjects  were 
asked  questions  concerning  their  language  learning  background.   A  sample 
questionnaire  will  be  found  in  Appendix  C.   Questioning  was  done  in  the 
recording  situation  in  order  to  accustom  subjects  to  recording  and  to 
focus  their  attention  on  speech.   Since  the  object  of  the  recording  was 
to  collect  excellent  speech,  no  attempt  was  made  to  distract  the  sub- 
jects' attention  from  the  task. 

Following  the  questioning,  the  subjects  read  the  practice  words, 
then  the  test  words  were  practiced  and  the  test  words  were  recorded. 
The  subjects  then  practiced  and  recorded  a  reading  of  the  Rainbow  Pas- 
sage (Fairbanks,  1960). 

The  American  subjects  also  recorded  six  of  the  test  words  directly 
onto  a  computer  using  an  Audio-technica  (model  AT801)  microphone.   The 
microphone  fed  the  signal  into  the  amplifier,  digital  analyzer  and 
computer  system  described  below,  in  order  to  test  the  comparability  of 
the  acoustic  records  made  using  a  traditional  sound  spectrograph  and 
using  a  computer  generated  sound  spectrograph.   The  records  were  compa- 
rable, although  the  computer  analysis  offered  amplification  of  the 
signal  that  could  be  used  to  examine  the  acoustic  record  more  closely. 
In  the  case  of  the  subjects  recording  directly  onto  the  computer,  only 
the  instructions  and  practice  of  the  test  words  were  repeated. 

Instrumentation 

Subjects  were  seated  about  6"  in  front  of  and  slightly  to  the  left 
of  the  microphone.   An  Electrovoice  (Model  361B)  microphone  and  an 
Electrovoice  (Model  RE15)  microphone  were  used.   Recordings  were  made 
using  a  TEAC  (model  X-3R)  tape  recorder  and  a  Nagra  (model  E)  tape 


-79- 
recorder  operated  at  7  1/2  inches  per  second.  Wide  band  spectrograms 
were  made  using  a  Voice  Identification  (Model  700)  sound  spectrograph 
using  the  standard  300  Hz  wide  band  analyzing  filter.  All  samples  were 
later  re-measured  using  the  GW  Instruments  MacSpeech  Lab  (version  2.0). 
In  this  procedure,  the  utterances  were  passed  from  an  Ampex  (Model 
AV-770)  tape  recorder  through  a  TTE  amplifier  (Model  411AFS)  incorporat- 
ing anti-aliasing  filters  and  digitized  at  a  10  KHz  sample  rate  by  a  GW 
Instruments  MacADIOS  (Model  411)  analyzer.   The  utterances  were  then 
passed  into  a  Macintosh  SE  computer  with  a  MacSpeech  Lab  (version  2.0) 
program.  The  MacSpeech  Lab  program  also  generates  a  speech  wave  form, 
along  with  the  spectrogram.  The  wave  form  was  used  to  help  determine 
beginning  and  ending  points  of  the  acoustic  events  when  the  spectrogram 
was  not  clear. 

Acoustic  Analysis 

Measurements  were  made  by  hand  to  the  nearest  5  ms  on  the  tradi- 
tional spectrograms.  All  samples  were  later  re-measured  using  the  GW 
Instruments  MacSpeech  Lab  (version  2.0).   Measurements  were  made  to  the 
nearest  5  ms  on  the  computer-produced  spectrograms.   Measurements  made 
by  hand  were  compared  with  measurements  made  on  the  computer  and  any 
discrepancies  were  re-measured  on  both  the  traditional  spectrograms  and 
on  the  computer  in  order  to  minimize  measurement  error. 

In  a  study  of  the  reliability  of  vowel  duration  measurement,  Allen 
(1978)  asked  seven  professional  phoneticians  to  measure  and  re-measure 
(after  a  delay)  vowel  durations  from  a  prepared  tape.  His  study  found 
variations  of  up  to  42  ms  between  and  within  judges  (with  extreme 


-80- 
outliers  excluded).  He  concluded  that  the  95%  confidence  interval  for 
vowel  duration  measurement  can  be  expected  to  range  from  about  10  to  25 
ms  depending  on  consonantal  context.   Context  is  important  because 
beginning  and  ending  criteria  for  consonants  may  determine  the  measure- 
ment of  both  consonants  and  vowels. 

In  order  to  explore  the  reliability  of  the  measurement  criteria  used 
in  this  study,  21%  (126/900)  of  the  traditional  spectrograms  were 
re-measured  by  another  judge  who  had  had  approximately  one  year's  expe- 
rience in  measuring  spectrograms.   All  differences  between  measurements 
were  within  5  to  25  ms  except  for  three.   These  three  involved 
differences  in  measurement  of  35,  70,  and  260  ms.   Inspection  of  the 
spectrograms  related  to  these  larger  differences  indicated  that  these 
differences  seemed  to  result  from  the  use  of  the  computer-analyzed 
spectrograms  and  waveforms  which  allowed  amplification  of  weak  traces 
that  might  not  have  been  clear  on  the  traditional  spectrograms.   For  the 
differences  which  ranged  from  5  to  25  ms  (60/126),  only  11  of  the  60 
were  between  15  and  25  ms.  The  rest  of  the  differences  were  5  or  10  ms. 
The  overall  mean  difference  in  measurement  for  the  900  measurements 
including  the  three  extremes  was  6.9  ms.  Without  the  extreme  outliers, 
the  mean  difference  was  4.3  ms.  Therefore,  a  set  of  reliable  measure- 
ment criteria  seems  to  have  been  consistently  applied. 

Criteria  for  Parameter  Measurement 

Three  types  of  measurements  were  made  for  the  purposes  of  this 
study.  Criteria  for  measurement  were  as  follows: 


-81- 

1)  Voice  onset  time  (VOT)  was  measured  relative  to  the  burst  of  the 
stop.  There  were  four  operational  categories  of  VOT  used. 

a.  Lead  voicing  was  defined  as  three  or  more  pulses  prior  to  the 
burst  of  the  stop.  These  instances  received  a  negative  value  of  VOT. 

b.  Onset  of  voicing  at  the  burst  of  the  stop.   These  instances 
received  a  VOT  value  of  0. 

c.  Lag  voicing  was  defined  as  voicing  that  began  after  the  release 
of  the  stop.  These  instances  received  a  positive  VOT  value. 

d.  Continuous  voicing  was  defined  as  voicing  that  continued  unbroken 
from  the  preceding  vowel.  These  instances  also  received  a  negative  VOT 
value.  These  instances  were  noted  and  deleted  from  the  comparisons. 

2)  Vowel  Duration  was  measured  from  the  onset  of  the  vowel  formant 
energy  to  the  last  strong  striation  indicating  second  and  third  vowel 
formant  energy. 

3)  Consonant  Closure  Duration  was  measured  from  the  last  strong 
striation  in  the  second  and  third  vowel  formant  to  the  release  of  the 
stop.   In  some  instances,  the  final  consonant  was  unreleased  and  thus 
the  closure  continued  until  the  onset  of  vowel  energy  for  the  following 
vowel.  These  instances  were  noted  and  deleted  from  the  comparisons. 

Analysis  of  Raw  Data 

Raw  data  were  analyzed  using  a  CMS  system  at  The  George  Washington 
University.   The  variables  tested  were  the  mean  /p,  b/  for  VOT,  /t ,  d/ 
for  final  consonant  closure  duration,  and  /i,  1/  for  vowel  duration.   In 
addition,  comparisons  were  made  for  the  ratios  of  the  /i/  and  /I/ 


-82- 
vowels  before  voiced  and  voiceless  consonants,  the  relative  duration  of 
/t/  to  /d/,  regardless  of  preceding  vowel. 

The  first  two  statistical  pre-tests  performed  were  tests  of  the 
data  to  see  if  the  data  met  the  assumptions  of  the  model  for  analysis  of 
variance.   The  first  test  performed  was  an  SAS  (version  5.16)  program 
which  obtained  the  Shapiro-Wilk  statistic  in  order  to  determine  if  the 
variable  scores  came  from  a  normal  distribution.   Results  indicated  that 
the  distributions  for  /b,I/  and  the  /t/  to  /d/  ratio  were  statistically 
significant  nonnormal  curves  for  some  of  the  language  groups. 

The  second  pre-test  performed  was  an  SPSS-X  (version  2.0)  program 
using  the  nonparametric  Wald-Wolfowitz  Runs  Test  to  determine  if  the 
distribution  of  the  values  for  the  English  group  and  the  Superior  group 
were  the  same.   This  test  revealed  statistically  significant  differences 
in  distributions  for  the  /d/  variable. 

For  the  variables  mentioned  above,  the  assumptions  necessary  to 
perform  a  parametric  analysis  of  variance  could  not  be  met.  Also, 
because  of  the  nonnormal  curves  revealed  for  /I/,  the  relative  duration 
of  both  vowels  before  voiced  and  voiceless  consonants  could  not  be 
assumed  to  have  met  the  assumptions  for  parametric  tests.   Therefore, 
these  variables  were  entered  into  a  nonparametric  one-way  analysis  of 
variance  (SAS  version  5.10)  which  used  the  Wilcoxon  scores  to  produce  a 
Kruskal-Wallis  Test  for  all  the  groups  combined  or  a  Wilcoxon  Rank-Sum 
Test  for  comparisons  between  the  English  and  the  Superior  groups.  Sepa- 
rate nonparametric  procedures  were  performed  comparing  all  of  the  groups 
and  comparing  the  English  and  Superior  groups. 

For  the  other  variables,  one-way  general  linear  model  analysis  of 
variance  (SAS  version  5.16)  procedures  were  performed  using  planned 


-83- 

comparisons  of  English  versus  Superior  groups,  Superior  versus  other 
groups,  and  English  versus  other  groups. 

Student's  t  tests  were  also  performed  on  the  /I/  and  /i/  vowels  and 
on  the  /t/  and  /d/  consonants  to  see  if  the  groups  were  differentiating 
these  phones  based  on  length. 

The  next  Chapter  will  present  the  results  of  the  statistical  ana- 
lyses and  related  acoustical  data. 


CHAPTER  4 
RESULTS 

Introduction 

The  claim  has  been  made  (Flege  &  Hillenbrand,  1984)  that  nonnative 
speakers  of  English  will  differ  from  native  speakers  with  regard  to 
such  phonetic  specifications  factors  as  the  phonetic  duration  parame- 
ters for  the  production  of  sounds  in  English.   This  claim  is  one  aspect 
of  the  Phonetic  Interference  Model,  first  proposed  by  Flege  in  1979, 
and  developed  in  his  more  recent  work  (see  Chapter  1).   The  claim,  as 
formulated  by  Flege  and  Hillenbrand  (1984),  states  that  there  are 
limits  on  the  phonetic  accuracy  that  will  be  achieved  by  nonnative 
speakers  in  their  second  language.   These  limits  result  from  the  iden- 
tification of  sounds  in  LI  and  L2. 

For  example,  Spanish  speakers  have  a  /p/  in  Spanish,  produced 
according  to  some  type  of  articulatory  or  perceptual  target  (the  nature 
of  which  is  not  clear),  and  the  Spanish  /p/  differs  in  phonetic  speci- 
fication from  the  English  /p/.   When  native  Spanish  speakers  learn 
English,  they  identify  the  English  /p/  with  the  Spanish  /p/.   Learning 
may  take  place  and  changes  in  the  phonetic  specification  for  /p/  may 
take  place.   The  claim  is,  however,  that  these  native  Spanish  speakers 
will  never  achieve  an  authentic  English  /p/  because  of  phonetic  inter- 
ference from  the  Spanish  /p/.   Flege  has  recently  defined  "authentic" 
(Flege  &  Eefting,  1987)  as  no  statistically  significant  difference 
between  the  native  and  the  nonnative  speakers'  productions.   Even 


-84- 


-85- 
superior  bilinguals,  which  would  include  speakers  having  no  perceptible 
accent,  would  be  expected,  if  this  claim  is  valid,  to  differ  from 
native  English  speakers  in  phonetic  specification  where  their  native 
languages  have  phonetic  specifications  that  do  not  match  the  phonetic 
specifications  of  English. 

The  present  study  is  concerned  with  testing  this  claim.   One  aspect 
of  phonetic  accuracy  is  the  accuracy  of  timing  of  phonetic  events. 
Phonetic  authenticity  of  duration  parameters  was  investigated  for 
native  speakers  of  Arabic,  Korean,  Spanish,  and  Thai  who  can  be 
regarded  as  superior  speakers  of  English  as  a  second  language. 

In  the  present  study,  data  were  gathered  on  productions  in  English 
by  nonnative  speakers  who  ranged  from  good  to  superior  in  speaking 
ability.   In  addition,  the  speakers  were  rated  on  accent  by  native 
English  speakers.   Production  and  rating  data  were  used  to  make  statis- 
tical comparisons  between  nonnative  speakers  speaking  English  and 
native  English  speakers.  Statistical  analysis  focused  on  comparisons 
between  the  best  nonnative  speakers  and  the  native  English  speakers. 
The  phonetic  parameters  compared  included  VOT,  voiced  and  voiceless 
consonant  closure  duration,  general  vowel  duration,  and  vowel  duration 
relative  to  the  voicing  of  the  following  consonant.   This  chapter  will 
present  the  results  of  the  acoustic  and  statistical  analyses.  Discus- 
sion of  results  for  each  phonetic  parameter  is  included. 

The  findings  of  the  present  study  will  be  presented  in  the  follow- 
ing order.   An  overview  of  results  will  be  given  first.   The  next  two 
major  sections  will  display  and  discuss  results  for  initial  and  final 
stop  consonants.   The  last  two  sections  will  display  and  discuss 
results  for  vowel  duration  and  for  relative  vowel  duration.   Within 


-86- 

each  major  section,  the  first  part  will  offer  an  overview  of  the 
results  for  each  language  group.  Previous  published  data  generally 
looked  at  only  one  language  group  at  a  time  and  did  not  combine  the 
speakers  into  mixed  groups  on  the  basis  of  perceived  accent.   There- 
fore, the  data  for  language  groups  can  most  easily  be  compared  to 
previously  published  data.   No  statistical  analyses  were  carried  out 
for  the  language  groups  because  that  was  not  within  the  purposes  of  the 
study. 

The  second  part  of  each  section  will  present  results  for  the  speak- 
ers in  rated  groups.  Statistical  analyses  that  compared  groups  of 
rated  speakers  will  be  found  in  this  part.   The  rated  speakers  were 
divided  into  four  groups,  labeled  native  English  speakers,  Superior 
nonnative  English  speakers,  Very  Good  nonnative  English  speakers,  and 
Accented  nonnative  English  speakers.  The  rated  groups  were  formed 
using  the  rating  procedure  discussed  in  Chapter  3.   Data  and  statisti- 
cal analyses  will  be  presented  for  all  rated  speaker  groups  for  pur- 
poses of  comparison. 

The  next  Chapter  will  discuss  the  results.   Interpretation  of  the 
results  of  the  present  study  can  best  be  made  by  relating  findings  from 
language  groups,  rated  speaker  groups,  and  individuals. 

Overview  of  Results 
Superior  bilinguals  were  not  found  to  be  significantly  different 
from  native  English  speakers  on  eight  of  the  nine  measures  tested  in 
this  study.   The  mean  Voice  Onset  Times,  mean  vowel  durations,  and  mean 
consonant  closure  durations  produced  by  the  Superior  bilinguals  were 
not  statistically  different  from  those  durations  produced  by  native 


-87- 
English  speakers.   Superior  speakers  were  significantly  different  from 
native  English  speakers  on  the  ratio  of  the  duration  of  a  vowel  before 
a  voiced  and  before  a  voiceless  consonant. 

There  were  significant  differences  found  when  all  four  groups  of 
native  and  nonnative  speakers  were  compared.  These  differences 
occurred  on  VOT  for  /b/,  consonant  closure  duration  for  /t/  and  /d/, 
the  length  of  the  /I/  vowel,  and  the  ratio  of  the  duration  of  a  vowel 
before  voiced  and  voiceless  consonants.   No  differences  were  seen 
between  the  native  and  nonnative  speaker  groups  for  VOT  for  /p/,  the 
duration  of  the  /i/  vowel,  or  the  relative  lengths  of  /t/  and  /d/. 

The  results  of  this  study  do  not  support  the  hypothesis  that  there 
is  a  limit  on  the  accuracy  that  nonnative  speakers  will  be  able  to 
achieve  in  production  of  L2  subphonemic  durations.   The  Superior  nonna- 
tive speakers  have  been  found  to  be  similar  to  native  English  speakers 
in  this  study. 

Initial  Stop  Voicing 
VOT  for  Language  Groups 

The  first  timing  parameter  examined  in  this  study  was  the  extent  to 
which  nonnative  speakers  of  English  had  acquired  the  duration 
specifications  for  initial  voiced  and  voiceless  bilabial  stop  conso- 
nants in  English.   Production  of  these  sounds  in  the  test  materials 
used  in  this  study  indicated  that,  as  expected,  some  changes  had  been 
made  in  how  all  nonnative  speakers,  regardless  of  rating,  specified 
these  sounds.  Table  12  and  Figure  3  present  the  mean  VOT  and  range 
found  for  /b/  and  /p/  for  all  speakers  of  each  language  in  this  study. 


Table  12:  Mean  VOT  and  range  of  means  in  ms  for  initial  bilabial 
stops  in  sentences  in  English  by  native  language 


Native  VOT 

Language     /b/        /p/ 


mean  s.d.   mean  s.d. 


English    +  9  +70 

(  4)       (13) 

Arabic     -49  +35 

(64)       (24) 

Korean     +13  +81 

(5)        (9) 

Spanish    -14  +42 

(49)       (17) 

Thai      -56  +58 

(69)       (16) 


Range  of  Means 
/b/     /p/ 


1:14   61:95 


/b/  /p/ 


30  30 


-133:14 

11:67 

17 

30 

4:19 

72:97 

30 

30 

113:11 

16:66 

28 

30 

139:9 

38:80 

26 

30 

Native  Language 


Time  of  onset  in  ms 


-240-220-200-180-160-140-120-100-80-60-40-20  0  +20+40+60+80+100+120 

0 

[-]        [ ] 

0/b/       /p/ 
0 
[ 1 ] 

/b/  0    /p/ 

0 

0[-]       [ ] 

0/b/        /p/ 

0 
[ n j 

/b/  0     /p/ 

0 

0 
[ ]      [ j 

/b/      0        /p/ 
0 


English 
Arabic 
Korean 
Spanish 

Thai 


Figure  3:  Range  of  VOT  in  ms  for  initial  bilabial  stops  in 

sentences  in  English  by  native  language  from  Table  12 


-89- 

Note  that  the  standard  deviations  and  ranges  around  the  mean  pro- 
ductions were  different  for  the  different  language  groups,  especially 
for  /b/,  for  speakers  from  the  same  language.   Range  of  VOT  reported  in 
other  studies  is  also  large  (see  Chapter  2). 

Figure  4  presents  published  data  (from  Tables  2  and  3  in  Chapter  2) 
combined  with  data  from  Figure  3  on  range  of  VOT  for  easier  comparison. 
The  values  in  the  present  study  were  derived  from  the  pooled  means  and 
range  of  means  for  each  language  group. 

VOT  for  Superior  Speakers 

Seven  speakers  in  this  study  were  judged,  in  the  rating  procedure 
described  in  Chapter  3,  to  be  Superior  speakers  of  English  as  a  second 
language.   Since  the  focus  of  the  present  study  is  on  acquisition  of 
timing  parameters  by  superior  bilinguals,  data  for  this  group  were 
compared  statistically  with  data  for  native  speakers  of  English.   In 
order  to  examine  the  data  more  closely,  the  other  rated  speaker  groups 
were  included  in  the  comparisons.  Table  13  and  Figure  5  display  mean 
VOT  and  range  of  VOT  for  all  of  the  rated  speaker  groups. 

An  analysis  of  variance  using  planned  comparisons  between  the 
English  speakers  and  the  nonnative  speakers,  between  the  Superior 
speakers  and  the  other  nonnative  speakers,  and  between  the  English  and 
Superior  speakers,  related  to  /p/  production  showed  no  significant 
differences  between  the  groups,  regardless  of  rating.   The  mean  VOT 
values  were,  however,  generally  longer  on  /p/  for  the  English  speakers 
than  for  the  nonnative  speakers  in  English. 


-90- 

Native  Language  Time  of  onset  in  ms 


-240-220-200-180-160-140-120-100-80-60-40-20  0  +20+40+60+80+100+120 

0 
^English  [ ]       [_]  [ ] 

/b/         0/b/  /p/ 
English,  this  study  [_]       [ ] 

0/b/       /p/ 
0 
0 
*in  Arabic  [ ] 

/b/    0 
Arabic  in  English       [ 1 ] 

/b/          0    /p/ 
0 
0 
*in  Korean  [_I ]   [ i 

0/b//p/  /ph/ 

Korean  in  English  0[ ]  [ ] 

0/b/  /p/ 
0 
0 

[-- ]         [_._] 

■-"in  Spanish  /b/  0/p/ 

0 
Spanish  in  English         [ ][ ] 

/b/            0     /p/ 
0 
0 
*in  Thai  [ ]   [ ][ j 

/b/      0/p/    /ph/ 
0 
Thai  in  English        [ ]     [ ] 

/b/      0        /p/ 


Figure  4:  Range  of  V0T  for  initial  bilabial  stops  from  Table  2  and  3 
and  from  the  present  study  by  language  group. 

Note:  All  V0T  values  from  Lisker  and  Abramson  (1964)  are  for  sen- 
tential productions  except  Spanish,  which  is  from  isolated 
words. 

*  V0T  are  published  data,  taken  from  Tables  2  and  3,  Chapter  2. 
Unstarred  data  are  from  this  study. 


-91- 


in  sentences 

in  English  by 

speaker  group 

Speaker 
Group 

VOT 

N             hi 

Range  of  Means 
/b/     /p/ 

/b/  /p/ 

mean  s.d 

mean  s.d. 

English 

+  9  * 

(4) 

+71 

(13) 

1:14   61:95 

30  30 

Superior 

-  8  * 
(52) 

+49 
(26) 

-125:19   16:97 

26  35 

Very  Good     -  4  *     +62         -  86:16   11:86         35  35 
(37)      (25) 

Accented      -55  *     +53         -139:  9   16:85         29  35 
(64)      (24) 


Note:  *  significant  differences  (_p_  <.0087)  between  entire  set  on  /b/ 
but  not  between  English  and  Superior 


Speaker  Group  Time  of  onset  in 


ms 


-240-220-200-180-160-140-120-100-80-60-40-20  0  +20+40+60+80+100+120 

0 
English  [_]       [ ] 

0/b/       /p/ 
0 
Superior  [ [  ] ] 

/b/    0      /p/ 
0 
Very  Good  [ 1 ] 

/b/       0      /p/ 
0 
Accented  [ ]  [ ] 

/b/  0  "    /p/ 


Figure  5:  Range  of  VOT  in  ms  for  initial  bilabial  stops  in  sentences 
in  English  by  speaker  group  from  Table  13 


-92- 

Differences  between  the  rated  groups  did,  however,  reach  signifi- 
cance on  VOT  for  /b/  (Kruskal-Wallis  Test,  £  <.0087  ).  However, 
separate  analysis  (Wilcoxon  Rank-Sum  Test)  of  the  differences  between 
the  native  English  speakers  and  the  Superior  speakers  did  not  show 
significant  differences  on  VOT  for  /b/. 

Data  were  also  gathered  on  some  variations  in  voicing  for  /b/  by 
the  rated  speaker  groups  (see  Tables  14  and  15).  Lead  voicing  was  not 
used  by  the  native  English  speakers  in  this  study,  although  it  is  one 
of  the  options  for  /b/  articulation  in  English.  Another  option  in 
English  is  to  continue  the  voicing  of  the  vowel  preceding  the  voiced 
stop  consonant  until  the  burst  of  the  stop  is  reached.   This  type  of 
voicing  will  be  referred  to  as  thru  voicing,  in  the  present  study. 
Tokens  of  stops  produced  with  continuous  or  thru  voicing  are  usually 
deleted  from  studies  of  the  present  type.   It  will  be  argued  that  stops 
made  with  thru  voicing  should  be  included  in  reported  data.   Tables  14 
and  15  display  lead  and  thru  voicing  data  for  the  language  groups  and 
for  the  rated  speaker  groups. 

Individual  VOT 

Table  16  shows  the  mean  VOT  and  range  of  mean  VOT  for  the  native 
English  speakers  and  the  mean  VOT  and  range  for  the  individual  Superior 
nonnative  speakers.  These  data  and  those  individual  data  that  follow 
were  calculated  by  taking  the  mean  and  ranges  of  the  individual 
speaker's  measurements  rather  than  pooled  speaker  group  means.   Values 
for  native  English  speakers  in  Table  16  are  the  mean  and  range  of  means 
given  earlier  for  the  English  group. 


-93- 


Table  14:  Lead  and  thru  voicing  of  /b/  in  English  by  native  language 
(n=30) 


Native    #  of  Ss  %  words  Range  of    #  of  Ss   %  words    Range  of 
Language   Lead     Lead    Lead  (ms)    Thru      Thru      Thru  (ms) 


English 

0/6 

0% 



0/6 

0% 



Arabic 

6/6 

33% 

-175:-70 

5/6 

43% 

-170:-85 

Korean 

0/6 

0% 



0/6 

0% 



Spanish 

1/6 

1% 

-175: -80 

1/6 

17, 

-235: -140 

Thai 

3/6 

33% 

-205: -80 

1/6 

13% 

-125:-155 

Table  15:  Lead  and  thru  voicing  of  /b/  in  English  by  speaker  group 


Speaker   #  of  Ss  %  words  Range  of    #  of  Ss   %  words    Range  of 
Group      Lead     Lead    Lead  (ms)    Thru      Thru      Thru  (ms) 


Superior   1/7      3%      -125       3/7      26%      -155:-  85 

Very  Good  1/7      14%    -  90:-  70     0/7       0%      

Accented   5/10     49%    -200:-  80     4/10      20%      -235:-100 


-94- 

Table  16:  Mean  VOT  and  range  in  ms  for  initial  bilabial  stops  in 

English  for  all  English  speakers  and  for  Superior  speakers 


Speakers         VOT  Range  of  Means       n 

/b/     /p/       /b/      /p/       /b/  /p/ 


English  +  9  +71 

Spanish  #3  +10  +51 

Spanish  #7  +11  +33 

Spanish  #5  +6  +40 

Arabic  #32  +14  +67 

Arabic  #25  +10  +16 

Korean  #12  +19  +97 

Thai  #20  -125  +38 


0:20 

50:110 

30 

30 

0:40 

40:65 

5 

5 

10:15 

15:55 

5 

5 

0:10 

40:40 

5 

5 

0:20 

40:90 

4 

5 

10 

0:55 

1 

5 

10:25 

80:120 

5 

5 

-125 

30:  65 

] 

5 

Note:  Speakers  #20  and  #25  had  only  one  /b/  that  was  not  thru-voiced 


-95- 

Final  Consonant  Closure 

Consonant  Closure  for  Language  Groups 

The  second  timing  parameter  examined  in  this  study  was  the  extent 
to  which  speakers  of  English  as  a  second  language  had  acquired  the 
duration  specification  for  the  length  of  the  closure  of  word  final  stop 
consonants.   In  English,  final  /t/  is  generally  longer  than  final  /d/. 
Table  17  presents  data  from  the  present  study  on  mean  closure  duration 
of  final  alveolar  stops,  mean  /t/  and  /d/  closure  duration  and  the  mean 
/d-t/  ratio  for  all  of  the  speakers  in  this  study  by  language  group. 
The  /d-t/  ratio  measures  whether  or  not  the  speakers  produced  /t/  and 
/d/  with  different  lengths. 

If  we  examine  the  ways  in  which  the  individuals  in  the  groups 
produced  /t/  and  /d/,  a  slightly  different  picture  emerges.   Table  18 
presents  the  data  for  the  relationship  between  /t/  and  /d/  productions 
for  the  speakers  of  each  language. 

The  overall  closure  duration  differences  may  be  related  to  the 
tendency  of  the  American  English  speakers  in  this  study  (and  generally) 
to  flap  /t,  d/  in  intervocalic  position.  Table  19  presents  flapping 
data  by  language  group. 

Consonant  Closure  for  Superior  Speakers 

Seven  Superior  bilinguals  were  compared  statistically  with  the 
native  English  speakers  to  determine  if  their  mean  stop  closure  dura- 
tion, mean  /t/,  mean  /d/,  and  mean  /d-t/  ratios  were  significantly 


-96- 


Table  17:  Mean  closure  duration,  mean  /t/,/d/  closure  duration  in  ms 
and  mean  /d-t/  ratio  for  final  alveolar  stops  in  sentences 
in  English  by  native  language 


Native      Closure 
Language    Duration 


mean  s.d. 


/t/ 
mean  s.d, 


/d/ 
mean  s.d, 


1AL 
It/ 


English 

Arabic 

Korean 

Spanish 

Thai 


46 


54 


(  9) 


(17) 


40 


(  4) 


102 


100 


(27) 


(30) 


105 


(30) 


75 


85 


(23) 


66 


(28) 


(20) 


87 


(46) 


90 


84 


(55) 


(38) 


108 


120 


(37) 


96 


(37) 


0.82 
1.08 
0.79 
1.04 
0.82 


(47) 


27  30 
30  30 

28  30 

29  29 

30  30 


Table  18:  Speaker  production  of  mean  final  /t/  and  mean  final  /d/ 
in  English  by  language  group 


Native 
Language 

ft/  = 

/d/ 

/t 

/  > 

/d/ 

It 

1   < 

/d/ 

Mean  /t-d/ 
Difference  (ms) 

English  speakers 

1 

4 

1 

16 

Arabic  speakers 

1 

2 

3 

24 

Korean  speakers 

1 

5 

0 

19 

Spanish  speakers 

2 

2 

2 

18 

Thai  speakers 

1 

4 

1 

30 

Note:  mean  /t 

-d/  di 

f ference 

was 

derived 

from  absolute  values 

-97- 


Table  19:  Number  of  flapped  /t/  and  /d/  in  English  by  native  language 
and  by  number  of  subjects 


Native    #  of  /t/ 
Language  flapped 


#  of  S 
flapped  ft/ 


#  of  /d/ 
flapped 


#  of  S 
flapped  /d/ 


English 

9/25 

4/6 

22/30 

6/6 

Arabic 

0/29 

0/6 

1/30 

1/6 

Korean 

6/21 

4/6 

9/29 

4/6 

Spanish 

9/29 

3/6 

10/29 

3/6 

Thai 

0/29 

0/6 

4/30 

1/6 

-98- 
different  from  those  values  for  the  native  English  speakers.   Table  20 
presents  comparison  data  by  speaker  group.   Table  21  presents  manner  of 
production  data  for  /t/  and  /d/  for  the  rated  speaker  groups. 

Separate  t  tests  (Student's  t)  for  each  rated  speaker  group  indi- 
cated that  none  of  the  groups  significantly  differentiate  /t/  from  /d/. 
The  general  tendency  seen  in  this  study,  was  for  the  groups  to  produce 
the  consonants  with  equal  length  but,  perhaps,  for  different  reasons 
for  each  group  of  speakers.  And,  although  the  consonants  were  produced 
with  the  same  closure  duration  statistically,  the  length  of  the  indi- 
vidual consonants  varied. 

Analysis  of  variance  using  a  set  of  planned  comparisons  between  the 
English  and  nonnative  groups,  between  the  English  and  Superior  groups, 
and  between  the  Superior  and  other  nonnative  groups  on  /t/  indicated 
that  the  English  speakers  were  significantly  different  from  some  of  the 
groups  of  nonnative  speakers  (_p_  <.0191).  The  Superior  group  was  not 
significantly  different  from  the  other  nonnative  groups,  nor  was  it 
significantly  different  from  the  English  speaker  group  on  /t/. 

For  /d/,  a  significant  difference  was  seen  between  the  rated 
speaker  groups  (Kruskal-Wallis  Test,  _d_  <.0021).   But,  in  a  separate 
analysis  (Wilcoxon  Rank-Sum  Test) ,  the  Superior  group  was  not  signifi- 
cantly different  from  the  native  English  speakers. 

The  /d-t/  ratio  was  not  found  to  be  significantly  different  for  the 
rated  groups  or  for  English  versus  Superior  speakers  (Kruskal-Wallis 
Test,  Wilcoxon  Rank-Sum  Test).   Again,  variances  were  large  in  some 
cases. 


-99- 


Table  20:  Mean  closure  duration,  mean  /t/,/d/  closure  duration  in  ms 
and  mean  /d-t/  ratio  for  final  alveolar  stops  in  sentences 
in  English  by  speaker  group 


Speaker     Stop  /d/  n 

Group       Closure     /t/      /d/        /t/         /t/  /d/ 
Mean  s.d.   mean  s.d.  mean  s.d. 


English      46  54  *  40  +      0.82        25  30 

(  9)  (17)  (  4) 

Superior     67  71  *  64  +      0.99        29  30 

(33)  (39)  (28) 

Very  Good    91  95  *  88  +      0.95        22  29 

(28)  (35)  (26) 

Accented    112  121  *  104  +      0.88        29  30 

(30)  (30)  (40) 


Note:  *  significant  differences  between  all  groups  on  /t/  (_p_  <.0191) 
but  the  Superior  Group  was  not  significantly  different  from 
English  speakers  or  from  Very  Good  and  Accented  speakers 

+  significant  differences  between  all  groups  on  /d/  (^  <.0021) 
but  the  Superior  group  was  not  significantly  different  from 
the  English  speakers 


Table  21:  Speaker  production  of  mean  final  /t/  and  mean  final  /d/ 
in  English  by  speaker  group 


Speaker  Mean  /t_d/ 

GrouP  /t/  =  /d/   /t/  >  /d/   /t/  <  /d/    Difference  (ms) 


English  1  4  1  16 

Superior  2  3  2  15 

Very  Good  1  3  3  21 

Accented  3  5  2  29 


Note:  mean  /t-d/  difference  was  derived  from  absolute  values 


-100- 

Overall  closure  duration  clearly  increases  with  distance  from 
native  English  in  terms  of  rating  of  accent.   Native  American  English 
speakers  used  very  short  final  stops  in  this  study,  as  can  be  seen  in 
Table  20.   Flapping  also  seems  to  play  a  part  in  this  effect.   Flaps 
are  usually  defined  as  alveolar  stop  productions  that  are  40  ms  or  less 
in  duration.   Table  22  displays  flapping  data  for  the  rated  speaker 
groups. 

Superior  speakers  tended  to  flap  more  often  than  the  other  rated 
groups.   None  of  the  Accented  group  flapped  final  /t/.   Flapping  by  the 
Superior  group  probably  influenced  the  finding  of  difference  between 
the  Superior  group  and  the  other  groups.   However,  excluding  tokens  of 
flapped  /t/  and  /d/,  the  mean  duration  of  final  consonant  closure  for 
the  groups  still  exhibits  an  increase  as  rated  distance  from  native 
English  increases  as  can  be  seen  in  Table  23.  Table  23  presents  mean 
consonant  closure  duration,  mean  /t/,  mean  /d/,  and  the  mean  /t-d/ 
difference  by  speaker  group  excluding  flapped  tokens. 

Individual  Consonant  Closure 

The  Superior  group  was  made  up  of  7  speakers  from  4  different 
native  languages.   Table  24  displays  the  mean  closure  duration,  includ- 
ing and  excluding  flaps  along  with  the  number  of  flapped  tokens  for 
each  speaker,  for  the  native  English  speakers  and  for  the  set  of  Supe- 
rior bilinguals. 

Table  25  presents  data  on  the  individual  sounds  including  flaps  for 
all  the  Superior  speakers,  in  order  of  rating  with  the  higher  rated 
nonnative  speakers  first.   Table  26  shows  the  mean  closure  duration, 


-101- 


Table  22;  Number  of  Flapped  /t/  and  /d/  in  English  by  speaker  group 
and  by  number  of  subjects 


Speaker   #  of  /t/      #  of  S         #  of  /d/     #  of  S 
Group     flapped     flapped  /t/        flapped     flapped  /d/ 


English  9/25 

Superior  12/32 

Very  Good  3/27 

Accented  0/49 


4/6 

22/30 

6/6 

4/7 

14/34 

4/7 

3/7 

5/35 

4/7 

0/10 

5/50 

2/10 

Table  23:  Mean  final  stop  closure  duration,  mean  /t/,  mean  /d/,  the 
mean  /t-d/  ratio  and  the  difference  between  /t/  and  /d/  in 
excluding  flapped  /t ,  d/  in  English  by  speaker  group 


ms 


Speaker 
Group 

Mean 
Closure 

/t/ 

/d/ 

/t/-/d/ 

/d/ 
/t/ 

ft/ 

n 
/d/ 

English 

62 

67 

51 

16 

0.76 

16 

8 

Superior 

95 

98 

93 

4 

0.96 

1.9 

20 

Very  Good 

105 

113 

98 

15 

0.87 

18 

29 

Accented 

116 

121 

112 

9 

0.93 

50 

45 

-102- 


Table  24;  Closure  duration  including  and  excluding  flaps  for  all 

English  speakers  and  for  all  Superior  speakers  listed  in 
order  of  strength  of  native  accent  rating 


Mean  Closure  in  ms 
Speaker  Including  Flaps   Excluding  Flaps 


English 
Spanish  #3 
Spanish  #7 
Spanish  #5 
Arabic  #32 
Korean  #12 
Thai  #20 
Arabic  #25 


46 
65 
18 

68 
80 
44 
76 
123 


62 

(;2 

130 
80 
78 
76 

123 


# 
/t/ 

Flaps 

/d/ 

9/25 

22/30 

1/4 

2/4 

5/5 

5/5 

3/5 

3/5 

0/5 

0/5 

3/5 

4/5 

0/5 

0/5 

0/5 

0/5 

Table  25:  Mean  closure  duration,  mean  /t/,/d/  closure  duration  in  ms, 
and  /d-t/  ratio  for  final  alveolar  stops  in  sentences 
in  English  for  all  English  speakers  and  for  all  Superior 
speakers  listed  in  order  of  strength  of  accent  rating. 


Speaker 

Stop 
Closure 

/t/ 

/d/ 

/t/ 
/d/ 

n 
Itl 

/d/ 

English 

46 

54 

40 

0.82 

25 

30 

Spanish  #3 

65 

59 

71 

1.21 

4 

4 

Spanish  #7 

18 

15 

20 

1.33 

5 

5 

Spanish  #5 

68 

67 

68 

1.01 

5 

5 

Arabic  #32 

78 

87 

73 

0.84 

3 

5 

Korean  #12 

44 

53 

34 

0.64 

5 

5 

Thai  #20 

76 

69 

82 

1.19 

5 

5 

Arabic  #25 

123 

145 

101 

0.70 

5 

5 

-103- 


Table  26:  Mean  closure  duration,  mean  /t/  and  /d/,  and  the  difference 
between  /t/  and  /d/  in  ms  excluding  flapped  /t ,  d/  for  all 
English  speakers  and  for  all  Superior  speakers 


Speaker 


Mean 
Closure 


/t/ 


/d/    /t/-/d/ 


/t/  /d/ 


English 

62 

Spanish  #3 

92 

Spanish  #7 

— 

Spanish  #5 

130 

Arabic  #32 

80 

Korean  #12 

78 

Thai  #20 

7  b 

Arabic  #25 

123 

67 

82 


51 
120 


125 

135 

87 

73 

83 

60 

69 

82 

145 

101 

16 
-38 

-15 

14 

23 

-13 

44 


16  8 

3  2 

2  2 

3  5 
2  1 
5  5 
5  5 


-104- 
mean  /t,  d/  and  the  difference  between  /t/  and  /d/  for  the  native 
English  speakers  and  for  the  nonnative  Superior  speakers  with  flapped 
tokens  excluded. 

Data  were  also  gathered  concerning  unreleased  final  alveolar  stop 
consonants.  Table  27  presents  this  data  by  language  group.  Table  28 
shows  unreleased  final  /d,  t/  by  rated  speaker  group. 

Vowel  Duration 

Vowel  Duration  for  Language  Groups 

The  third  timing  parameter  examined  in  this  study  was  the  extent  to 
which  nonnative  speakers  of  English  had  acquired  the  duration  specifi- 
cations for  vowels  in  English.  Table  29  presents  the  overall  mean 
vowel  duration  and  the  mean  individual  vowel  durations  for  /i/  and  /I/ 
for  the  English  speakers  and  for  all  of  the  nonnative  speakers  in 
English  words  by  language  group.  This  table  also  includes  the  mean 
/I-i/  ratio  for  these  speakers.   This  ratio  measures  whether  or  not  the 
speakers  produced  /I/  and  /i/  with  different  lengths.  These  means  and 
those  that  follow  in  this  section  were  derived  from  vowels  averaged 
across  a  following  voiced  or  voiceless  consonant.   Vowel  duration 
differences  before  voiced  and  voiceless  consonants  will  be  discussed  in 
the  next  major  section.  Note  that  the  variances  around  the  mean  pro- 
ductions often  do  not  resemble  the  values  for  other  groups. 

Vowel  Duration  for  Superior  Speakers 

The  seven  Superior  speakers  were  compared  statistically  with  the 
native  English  speakers  in  the  present  study.   Table  30  presents  the 


-105- 

Table  27:  Unreleased  final  /t/  and  /d/  in  English  by  native 
language  and  by  number  of  subjects 

Native    #  of  /t/      #  of  S         #  of  /d/     #  of  S 
Language  unreleased  unreleased  /t/    unreleased   unreleased  /d/ 


English 

5/30 

3/6 

Arabic 

1/30 

1/6 

Korean 

8/30 

4/6 

Spanish 

1/30 

1/6 

Thai 

1/30 

1/6 

0/30 

0/6 

0/30 

0/6 

1/30 

1/6 

1/30 

1/6 

0/30 

0/6 

Table  28:  Unreleased  final  /t/  and  /d/  in  English  by  speaker  group 
and  by  number  of  subjects 

Speaker   #  of  /t/      #  of  S         £  of  /d/     #  of  S 

Group     unreleased  unreleased  /t/    unreleased   unreleased  /d/ 


English 

5/30 

3/6 

0/30 

0/6 

Superior 

3/35 

2/7 

1/35 

1/7 

Very  Good 

8/35 

4/7 

1/35 

1/7 

Accented 

1/50 

1/10 

0/50 

0/10 

-106- 


Table  29:  Mean  vowel  duration,  mean  /I/,  /i/  in  ms,  and  mean  /I-i/ 


ratio  in  English 

by  native 

langi 

jage 

Speaker 
Group 

Vowel  Duration 
mean  s.d. 

/I/ 
mean  s.d 

• 

/i/ 
mean  s.d. 

/i/ 
/I/ 

English 

149 
(18) 

139 

(17) 

159 
(20) 

0.88 

Arabic 

135 
(30) 

125 
(25) 

143 
(31) 

0.87 

Korean 

168 
(18) 

161 

(18) 

176 
(19) 

0.92 

Spanish 

142 

139 

145 

0.97 

(30) 


(29) 


(19) 


Thai 


160 


(  9) 


142 


(12) 


179 


(19) 


0.80 


Table  30: 

Mean  vowel 
level  of  di 
/1/-/I/,  in 

duration,  mean  /I/,  /i/  in  ms ,  sig 
fference  between  /I/  and  /i/ ,  and 
English  by  speaker  group 

nificance 
ratio  of 

Speaker 
Group 

Vowel 
Duration 

/I/ 

/!/ 

sign. 
diff. 

/i/ 

mean  s.d. 

mean  s.d. 

mean  s.d. 

/I/ 

English 

149 
(18) 

139  * 
(17) 

159 
(20) 

.0003 

0.88 

Superior 

140 
(16) 

134  * 
(16) 

146 
(18) 

.046 

0.92  + 

Very  Good 

169 
(22) 

164  * 
(20) 

174 
(26) 

n.s. 

0.95  + 

Accented 

147 
(29) 

132  * 
(26) 

161 

(34) 

.0003 

0.83  + 

Note:  *  significant  d 

ifferences  ( 

p_  <.0241)  between  the 

groups  on  /I/ 

+  significant  differences  (_p_  <.002)  between  Superior  and  other 
nonnative  groups,  although  Superior  was  not  significantly 
different  from  English 


-107- 

mean  overall  vowel  duration,  mean  /I/,  mean  /i/,  and  mean  /I-i/  ratio 
for  the  rated  speaker  groups.   These  means  and  the  ratios  were  derived 
from  vowels  averaged  across  a  following  consonant. 

In  separate  t  tests  (Student's  t),  all  groups  except  the  Very  Good 
group  were  producing  /I/  and  /i/  with  statistically  significant  differ- 
ent lengths  (English  _p_  <.0003;  Superior  _p_  <.046;  Accented  _p_  <.0003). 

For  the  rated  speaker  groups,  analysis  of  variance  indicated  sig- 
nificant differences  between  the  speaker  groups  on  /I/  (Kruskal-Wallis 
Test,  _p_  <.0241).  Separate  analysis  (Wilcoxon  Rank-Sum  Test)  indicated 
no  difference  between  the  English  speakers  and  the  Superior  speakers  on 
/I/.  There  were  no  significant  differences  in  an  analysis  of  variance 
between  the  groups  on  /i/. 

A  significant  difference  also  occurred  in  an  analysis  of  variance 
using  a  set  of  planned  comparisons  between  the  English  and  nonnative 
speaker  groups,  between  the  Superior  and  other  nonnative  speaker  groups 
and  between  the  English  and  Superior  groups  for  the  vowel  ratios. 
There  was  no  significant  difference  between  the  English  and  the  Supe- 
rior speakers  on  vowel  ratio,  nor  between  the  English  speakers  and  the 
three  groups  of  rated  nonnative  speakers  on  vowel  ratio.  Significant 
differences  were  seen,  however,  between  the  Superior  speakers  and  the 
other  nonnative  speakers  (_p_  <.002)  on  the  vowel  ratio. 

Individual  Vowel  Duration 

Three  native  speakers  of  Spanish  were  the  most  highly  rated  of  the 
Superior  group.   Table  31  presents  vowel  duration  data  for  all  Superior 
speakers  and  data  on  the  mean  range  of  /I/,  /i/,  and  mean  /I-i/  ratio 
for  the  English  speakers. 


-108- 


Table  31:  Mean  overall  vowel  duration,  mean  /I/,  /i/  in  ms,  and  /I-i/ 
ratio  in  English  for  all  English  speakers  and  for  Superior 
speakers,  with  range  for  English  included 


Speaker 


Vowel 
Duration 


/I/ 


/i/ 


III 
/i/ 


English 

149 

139 

(118-168) 

159 

Spanish  #3 

141 

143 

138 

Spanish  #7 

116 

105 

127 

Spanish  #5 

133 

133 

132 

Arabic  #32 

128 

120 

135 

Arabic  #25 

162 

152 

171 

Korean  #12 

149 

146 

152 

Thai  #20 

153 

137 

169 

0.88  (0.84-0.91) 

1.04 

0.83 

1.01 

0.89 

0.89 

0.96 

0.81 


-109- 
Vowel/Consonant  Ratio 

Vowel/Consonant  Ratio  for  Language  Groups 

The  fourth  timing  parameter  examined  in  this  study  was  the  rela- 
tionship between  the  length  of  the  vowel  and  the  voicing  of  the  follow- 
ing consonant.   In  English,  a  vowel  before  a  voiced  consonant  is  longer 
than  a  vowel  before  a  voiceless  consonant.  Table  32  presents  data  from 
the  present  study  for  mean  vowel  duration  before  voiced  and  voiceless 
consonants,  the  mean  difference  between  the  vowels  in  the  two  contexts, 
and  the  mean  ratio  of  the  vowels  in  the  two  contexts,  for  all  of  the 
speakers  by  language  group.  Mean  durations  of  the  /i/  and  /I/  vowels 
have  been  combined  for  this  analysis.  The  ten  /I,  i/  vowels  examined 
in  this  section  occurred  before  final  /t/  (3/10),  /d/  (4/10),  /k/ 
(2/10),  and  /g/  (1/10).  Separate  analyses  were  performed  for  the 
complete  set  of  10  vowels  and  for  vowels  only  before  /t/  and  /d/  (a 
total  of  4  paired  vowels) .  Results  were  essentially  the  same  for  both 
analyses  and  it  was  decided  to  report  the  figures  for  the  more  complete 
set . 

Vowel/Consonant  Ratio  for  Superior  Speakers 

The  effect  of  stop  voicing  on  vowel  duration  was  compared  statisti- 
cally for  the  native  English  speakers  and  for  the  rated  speaker  groups, 
with  attention  focused  on  the  Superior  speakers.   Table  33  presents 
data  concerning  the  overall  mean  vowel  duration  before  voiced  and 
voiceless  consonants,  the  mean  difference  in  vowels  in  these  contexts, 
and  the  mean  ratio  of  the  difference  for  the  native  English  speakers 
and  the  rated  speaker  groups. 


-110- 

Table  32:  Mean  vowel  duration  before  voiced  and  voiceless  consonants, 
mean  difference  before  voiced  and  voiceless  consonants, 
and  mean  ratio  in  English  by  language 


Native 
Language 


Vowel  + 
Voiced  C 


Vowel  + 
Voiceless  C 


mean  s.d.   mean  s.d. 


Mean  Diff, 
in  ms 


Ratio 


n 
vd.  vl, 


English 


Arabic 


Korean 


Spanish 


Thai 


185 


(24) 


151 


(39) 


203 


(28) 


171 


(44) 


178 


(11) 


113 


(15) 


119 


(25) 


133 


(10) 


113 


(18) 


143 


(11) 


7] 


32 


70 


58 


34 


0.61 


0.80 


0.66 


0.67 


0.81 


30  30 


30  30 


30  30 


30  30 


30  30 


Table  33:  Mean  vowel  duration  before  voiced  and  voiceless  consonants, 
mean  difference  before  voiced  and  voiceless  consonants, 
significance  level  of  mean  difference,  amount  of  difference, 
and  ratio  of  difference  in  English  by  speaker  group 


Speaker 
Group 


Vowel  + 
Voiced  C 


Vowel  + 
Voiceless  C 


mean  s.d    mean  s.d. 


Mean 
sign.   Diff. 
level   in  ms 


Ratio   vd, 


n 
vl. 


English 
Superior 
Very  Good 
Accented 


185 


(24) 


163 


(19) 


203 


(37) 


165 


(36) 


113 


(15) 


117 


(15) 


135 


(15) 


128 


(25) 


.0001    71    0.61  *+   30  30 


.0002    45    0.72  *+   35  35 


.0019    68    0.68  *    35  35 


.0001    38    0.78 


50  50 


Note:  ^significant  differences  (_p_  <.0078)  between  all  groups  on  ratio 
+significant  differences  (p_  <.017)  between  English  and  Superior 


-Ill- 
Separate  t  tests  (Student's  t)  indicated  that  there  was  a  signifi- 
cant difference  in  the  length  of  the  vowels  before  voiced  and  voice- 
less consonants  for  all  groups.   (English,  _p_  <.0001;  Superior, 
_p_<.0002;  Very  Good,  £  <.002;  Accented,  _p_  <.0002.) 

For  the  rated  speaker  groups,  analysis  that  the  groups  were  dif- 
ferent on  the  voiced/voiceless  ratio  (Kruskal-Wallis  Test,  _p_  <.0078). 
Separate  analysis  indicated  that  the  Superior  group  was  different  from 
the  English  group  on  that  ratio  (Wilcoxon  Rank-Sum  Test,  _p_  <.017). 

Vowel/Consonant  Ratio  for  Individuals 

Table  34  presents  the  data  on  the  relationship  between  the  length 
of  the  vowel  and  the  voicing  of  the  following  consonant  for  all  of  the 
native  English  speakers  and  all  of  the  individual  Superior  speakers  in 
order  of  rating  with  the  highest  rated  nonnative  speaker  first.  The 
mean  length  of  a  vowel  before  a  voiced  and  voiceless  consonant  for  the 
individual  speakers  is  included  in  Table  34. 

The  next  Chapter  will  review  the  results  obtained  in  this  study 
and  will  discuss  the  implications  of  those  results.   Some  conclusions 
will  be  offered. 


-112- 

Table  34:  Mean  vowel  duration  before  voiced  and  voiceless  consonants, 
mean  difference  before  voiced  and  voiceless  consonants, 
and  mean  ratio  in  English  for  individual  Superior  speakers 
listed  in  order  of  strength  of  accent  rating 


Speaker 

Vowel  + 
Voiced  C 

Vowel 
Voice. 

+ 
Less  C 

Mean  Diff. 
in  ms 

Ratio 

vd. 

n 
vl. 

English 

185 

113 

71 

0.61 

30 

30 

Spanish  #3 

162 

119 

43 

0.73 

5 

5 

Spanish  #7 

135 

47 

38 

0.72 

5 

5 

Spanish  #5 

167 

98 

69 

0.59 

5 

5 

Arabic  #32 

140 

115 

25 

0.82 

5 

5 

Korean  #12 

176 

122 

54 

0.69 

5 

5 

Thai  #20 

170 

136 

34 

0.80 

5 

5 

Arabic  #25 

189 

134 

55 

0.71 

5 

5 

CHAPTER  5 
DISCUSSION  AND  CONCLUSIONS 

Introduction 


Foreign  accent  is  a  result  of  the  pronunciations  of  sounds  or 
sound  sequences  in  a  nonnative  language  that  do  not  match  the  pronun- 
ciations made  by  native  speakers  of  a  language.   Traditionally,  these 
nonnative  pronunciations  have  been  considered  separately  and  for 
different  purposes  by  phoneticians  or  phonologists.   At  present,  many 
researchers  in  the  several  areas  related  to  second  language  sound 
acquisition  seem  to  be  stating  that  one  step  in  the  development  of  an 
adequate  theoretical  base  for  pronunciation  acquisition  may  require  a 
restructuring  of  the  traditional  separation  of  phonetics  and  phonol- 
ogy.  This  restructuring  may  be  required  because  the  traditional 
models  of  sound  acquisition  based  on  phonetics  or  phonology  do  not 
fully  explain  the  acquisition  of  second  language  pronunciation. 
Chapter  1  examined  Flege's  (1979)  review  of  the  deficits  inherent  in 
traditional  models  and  presented  the  model  proposed  by  Flege. 

Recently,  Flege  (1979)  and  others  (Mitleb,  1981;  Keating,  1985; 
Port,  et  al|  1980)  have  begun  to  explore  the  directions  in  which 
restructuring  of  phonetic  and  phonological  models  of  second  language 
pronunciation  acquisition  may  go.   Flege  (1979)  found  evidence  that 
explanation  of  the  progress  of  acquisition  of  sounds  in  a  second 
language  requires  cross-language  comparisons  of  what  turn  out  to  be 


-113- 


-114- 
language-specific  rules  of  phonetic  implementation.  The  patterns  of 
production  that  make  up  these  rules  had  previously  been  thought  to  be 
physiologically  determined  phenomena  that  were  irrelevant  in  a  lan- 
guage phonology.   In  the  Phonetic  Interference  Model  (Flege,  1979), 
subsegmental  or  suprasegmental  factors  that  differ  between  languages 
are  invoked  as  explanations  for  foreign  accent.   In  a  strong  statement 
concerning  this  kind  of  claim,  Keating  (1985)  has  even  speculated  that 
all  phonetic  phenomena  may  be  controlled  by  rule.   Chapter  2  reviewed 
the  Phonetic  Interference  Model,  and  evidence  for  this  type  of  model. 

Part  of  the  development  of  models  is  the  testing  of  claims,  impli- 
cit or  explicit,  made  by  a  model.   The  present  study  was  concerned 
with  testing  a  claim  related  to  the  Phonetic  Interference  Model. 
Flege  and  Hillenbrand  (1984)  claim  that  there  is  a  limit  to  the 
authenticity  with  which  sounds  can  be  acquired  in  English  by  nonnative 
speakers  of  English.   This  aspect  of  the  Phonetic  Interference  model 
states  that  two  sounds  that  are  similar  in  the  first  and  second  lan- 
guages will  be  perceived  by  the  second  language  learner  as  variant 
realizations  of  the  same  sound.  Flege  (1984,  1987)  has  referred  to 
this  process  as  "equivalence  classification."  Some  sort  of  mechanism 
such  as  the  proposed  equivalence  classification  is  necessary  in  order 
to  explain  how  speakers  of  a  language  know  that  the  varieties  of  a 
particular  sound,  including  those  that  are  distorted  and  compressed  by 
running  speech,  are  tokens  of  one  sound  and  not  several. 

According  to  the  Phonetic  Interference  Model ,  some  changes  may 
take  place  in  equivalence  classifications  of  sounds  as  the  second 
language  learner  becomes  more  experienced.   But  limits  in  reclassifi- 
cation of  sounds  are  reached  because  of  the  continuing  influence  of 


-115- 
the  original  system.  The  LI  system  is  the  basic  classification  system 
and  the  one  that  is  modified. 

The  final  L2  productions  will,  according  to  the  limits  proposal  of 
the  Phonetic  Interference  Model,  be  made  with  phonetic  values  that  are 
intermediate  to  phonetic  values  in  the  LI  and  the  L2 .   Sounds  that 
are,  in  some  sense,  similar  in  the  two  competing  sound  systems  (rather 
than  sounds  that  are  completely  new)  will  be  those  that  tend  to  show 
the  effects  of  equivalence  classification.   (The  status  of  sounds  as 
"new"  or  "similar"  is  not  entirely  clear.)   The  goal  of  the  present 
study,  then,  was  to  test  the  phonetic  authenticity  of  sound  production 
by  excellent  speakers  of  English  as  a  second  language.  One  factor  in 
the  phonetic  specification  of  sounds  is  the  timing  and  the  duration  of 
phonetic  events. 

Discussion 

The  nonnative  speakers  from  four  different  language  backgrounds 
were  regrouped  for  the  purposes  of  this  study  by  a  rating  procedure 
that  ranked  the  speakers  according  to  their  perceived  strength  of 
accent  (see  Chapter  3).  The  regrouping  was  done  in  order  to  establish 
a  set  of  reliably  accepted  superior  bilinguals.   However,  the  basis 
upon  which  listeners  make  judgements  of  accent  strength  is  not  clear. 
There  are  many  parameters,  as  well  as  combinations  of  parameters,  of 
speech  production  that  may  be  involved.   The  parameter  of  subsegmental 
duration  was  being  examined  in  this  study.   But  this  parameter  may  not 
be  the  most  salient  or  important  in  listener  judgements  of  strength  of 
accent.   Flege  (1984)  has  shown  that  trained  listeners  can  detect 


-116- 
French-accented  English  from  as  little  as  the  burst  of  a  /t/.   This 
type  of  evidence  seems  to  indicate  that  duration  differences  such  as 
those  examined  here  are  detectable  by  the  listener.  However,  as  Flege 
(1987)  notes,  ".  .  .a  difference  does  not,  of  course,  in  itself 
guarantee  that  the  divergence  of  the  L2  subjects  from  the  phonetic 
norms  of  the  L2  being  learned  is  linguistically,  socially,  or  psycho- 
logically relevant"  (p.  68).   Whether  or  not  detectable  duration 
differences  play  a  role  in  listener  judgements  remains  an  important 
question  for  future  research. 

The  highest  ranked  group  of  nonnative  speakers,  the  Superior 
speakers,  was  made  up  of  three  native  speakers  of  Spanish,  two  native 
speakers  of  Arabic,  one  native  speaker  of  Korean,  and  one  native 
speaker  of  Thai  (see  Table  11).   One  limitation  on  the  generalizabil- 
ity  of  the  results  of  the  present  study  is  the  composition  of  the 
Superior  group.  A  more  ideal  situation  would  have  been  to  have 
recorded  equal  numbers  of  Superior  speakers  from  all  the  language 
groups  and  to  have  selected  only  speakers  for  the  study  who  would  be 
rated  as  native  English  speakers  by  the  listeners. 

The  three  native  Spanish  speakers  were  the  highest  ranked  of  the 
nonnative  speakers,  perhaps  because  Spanish  is  closer  in  linguistic 
structure  to  English  than  are  Arabic,  Korean,  or  Thai.   It  may  be  that 
syntactic  and  semantic  differences  in  languages  play  a  role  in  foreign 
accent — at  least  in  the  sense  that  having  less  linguistic  factors  to 
adjust  may  allow  the  nonnative  speaker  to  expend  more  energy  in  per- 
fecting pronunciation. 

However,  each  of  these  languages  varied  in  different  ways  in 
phonetic  specification  for  the  sounds  the  language  shared  with 


-117- 

English.  Spanish  was  not  necessarily  "closer"  in  overall  phonetic 
specification  than  were  the  other  languages.  For  example,  English  has 
two  major  categories  of  VOT  as  does  Spanish  but  the  categories  are 
different.  Arabic  has  only  one.   Korean  and  Thai  have  three  categories 
of  VOT  but  the  categories  are  different  (see  Table  3). 

The  languages  differ  in  final  stop  closure  requirements  in  that 
English  has  voiced  and  voiceless  final  stops  and  three  ways  of  produc- 
ing each  of  them,  while  Spanish  has  only  voiced  fricatives  in  final 
position.  Korean  has  only  a  final  voiced  stop,  and  Thai  has  only  a 
final  voiceless  stop  (see  Table  4) . 

The  languages  also  differ  in  vowel  duration  relative  to  the  voic- 
ing of  the  final  consonant.   English,  Korean,  and  Spanish  show  a  vowel 
length  contrast,  while  Arabic  and  Thai  do  not  (see  Table  8). 

Finally,  vowel  structure  also  differs  between  the  languages. 
English  has  /I/  and  /i/  but  the  other  languages  have  only  phonemic 
/i/,  (although  Korean  and  Thai  have  allophonic  /I/).   Unlike  English 
and  Spanish,  Arabic,  Korean,  and  Thai  also  exhibit  phonemic  vowel 
length  (see  Table  6). 

When  the  Superior  speakers  were  compared  with  the  native  English 
speakers,  the  results  of  the  statistical  analysis  indicated  that  the 
members  of  the  Superior  group  were  not  homogeneous.  Variances  were 
large  around  mean  productions  and  examination  of  individual  values 
indicated  differences  in  the  directions  from  which  the  speakers  were 
approaching  authentic  English  sound  production.   Because  the  Superior 
group  and,  presumably,  the  other  nonnative  groups  were  not  homoge- 
neous, the  statistical  results  obtained  in  the  present  study  must  be 
interpreted  with  caution. 


-118- 

However,  this  finding  appears  to  provide  additional  evidence  for 
the  basic  proposal  of  the  Phonetic  Interference  Model,  which  says  that 
the  sound  system  of  L2  is  not  the  only  determining  factor  in  authen- 
ticity but  rather  that  authenticity  is  determined  by  an  interaction 
between  the  LI  and  the  L2 .   If  learning  English  were  simply  a  matter 
of  adding  a  segment  not  present  in  the  native  language,  as  traditional 
phonological  models  have  claimed,  then  the  set  of  nonnative  speakers 
would  have  behaved  alike.  They  did  not. 

Additional  evidence  for  some  type  of  Phonetic  Interference  Model 
seems  to  come  from  the  fact  that  the  speakers  from  different  language 
backgrounds  did  not  make  the  same  changes  in  the  phonetic  specifica- 
tion of  sounds.   If  phonetic  durations  were  physiologically  deter- 
mined, as  traditional  phonetic  models  have  claimed,  then  the  diverse 
members  of  the  group  of  Superior  speakers  would  have  made  the  same 
adjustments  in,  for  example,  the  /p/  produced  in  English,  and  this  did 
not  happen.   For  the  most  part,  explanations  of  the  differences  in  the 
sounds  produced  by  the  nonnative  speakers  and  the  sounds  produced  by 
the  native  English  speakers  can  be  related  to  the  phonetic  durations 
of  a  similar  sound  in  the  native  non-English  language. 

The  general  findings  of  this  study  also  appear  to  provide  addi- 
tional evidence  that  a  purely  phonetic  model  is  not  adequate  to 
explain  foreign  accent.  If  learning  were  merely  a  matter  of  correcting 
articulatory  placement,  then  the  Superior  speakers,  as  a  group,  should 
have  been  similar  in  their  acquisition  of  such  phonetic  requirements 
as  the  /I/  and  they  were  not. 

Four  specific  duration  parameters  were  examined  in  this  study  for 
the  Superior  speakers'  productions  in  English.   These  four  were  voice 


-119- 
onset  time,  consonant  closure  duration,  vowel  duration,  and  vowel 
duration  before  voiced  and  voiceless  consonants.  The  Superior  group 
was  contrasted  with  the  native  English  speakers  on  these  parameters. 
The  comparisons  made  also  included  the  other  groups  of  nonnative 
speakers. 

VOT 

Significant  differences  were  found  in  /b/  but  not  in  /p/  produc- 
tion of  Voice  Onset  Time  when  all  of  the  nonnative  speakers  were 
compared  with  the  native  English  speakers  (see  Table  13).   Superior 
speakers,  however,  were  not  different  from  the  native  English  speakers 
with  regard  to  VOT  for  /b/  or  for  /p/.   This  result  contradicts  the 
limit  hypothesis  proposed  by  Flege  and  Hillenbrand  (1984).   And,  in 
previously  published  data,  nonnative  speakers  have  usually  been  found 
to  differ  statistically  from  nonnative  speakers  on  VOT. 

Production  of  /p/  by  Superior  bilinguals  varied  greatly  and  can  be 
related  to  /p/  production  in  the  native  language  of  the  individual 
Superior  speaker  (see  Tables  3  and  16).  In  the  case  of  /b/,  there  are 
indications  that  excellent  bilinguals  may  have  made  important  changes 
in  the  categorization  of  their  initial  bilabial  stops.   The  following 
subsections  will  discuss  the  possible  contributions  of  various  factors 
to  the  results. 

Influences  that  can  be  related  to  native  language.   Overall,  the 
nonnative  speakers  in  this  study  seem  to  have  solved  the  problem  of 
using  two  categories  of  VOT  values  as  required  by  English  where  their 
native  language  had  1  to  3  categories  of  VOT  (see  Table  3).  Most  of 
the  nonnative  speakers  had  formed  two  non-overlapping  categories  for 


-120- 
the  specification  of  initial  bilabial  stop  consonants  in  English  (see 
Table  12).   One  stop  was  produced  by  voicing  prior  to  or  shortly  after 
the  burst  of  the  stop.  The  second  stop  was  produced  by  voicing  pro- 
duced with  a  longer  or  shorter  period  of  aspiration.   The  two  catego- 
ries used  by  the  nonnative  speakers  are  related  to  the  categories 
found  in  the  native  language  (see  Figure  4).   As  expected,  the  bilin- 
guals  in  this  study  have  achieved  a  set  of  initial  bilabial  stops  that 
could  be  used  in  English.  The  question  of  authenticity  raised  by 
Flege  and  Hillenbrand  (1984)  is  concerned  with  how  close  bilinguals 
can  come  to  the  VOT  values  specified  for  English. 

Native  English  speakers'  mean  production  of  /p/  was  at  least  20  ms 
longer  than  the  productions  of  the  other  speaker  groups  in  this  study 
(see  Table  13).  This  figure  is  slightly  less  than  the  difference  (25 
ms)  obtained  by  Flege  (1980)  comparing  English  speakers  and  advanced 
Arabic  speakers.  In  the  Flege  (1980)  study,  Arabic  speakers  were 
significantly  different  from  English  speakers  but  the  standard  devia- 
tions in  that  study  were  4  ms  for  English  speakers  and  11  ms  for 
advanced  Arabic  speakers  of  English.   The  standard  deviations  in  the 
present  study  were  twice  as  large  (see  Table  13).   Large  standard 
deviations  may  have  resulted  from  grouping  speakers  who  were  approach- 
ing the  English  norms  from  different  directions. 

Comparisons  of  the  VOT  values  for  the  language  groups  and  the 
rated  groups  indicate  that  the  productions  of  the  rated  groups  seem  to 
combine  most  of  the  characteristics  of  the  languages  from  which  the 
speakers  came  (see  Tables  12  and  13).   For  example,  the  Accented  group 
most  closely  resembles  the  Thai  group,  which  contributed  5/10  of  the 
members  of  the  Accented  group. 


-121- 

VOT  for  /p/.   Data  from  the  Lisker  and  Abrarason  (1964)  study 
presented  in  Table  3  show  that  the  English  long  lag  voicing  category 
in  that  study  ranges  from  15  to  70  ms .   The  maximum  positive  VOT  limit 
found  for  native  English  speakers  in  the  present  study  was  110  ms  (see 
Table  13).   If  the  range  for  English  /p/  is  anywhere  from  15  to  110 
ms,  it  seems  difficult  to  judge  if  the  nonnative  speakers  should  be 
considered  to  be  inaccurate  in  their  productions  of  /p/.   (Note  that 
comparisons  of  this  sort  are  made  by  comparing  pooled  mean  productions 
by  the  group.)  A  problem  that  arose  in  the  interpretation  of  the 
results  of  this  study  concerned  the  difficulty  of  deciding  what  type 
of  difference  makes  a  difference.   If  only  mean  values  are  used, 
interpretation  is  often  straightforward.   However,  when  the  range  of 
values  is  noted,  such  as  the  range  described  for  /p/  (which  may  be 
comparable  to  some  extent  across  studies),  it  can  be  seen  that  inter- 
pretation becomes  more  difficult.   Further  considerations  related  to 
/p/  VOT  will  be  presented  in  the  subsection  on  categorization  of  VOT. 

VOT  for  /b/.   More  changes  by  the  Superior  nonnative  speakers  seem 
to  have  been  made  in  /p/  production  related  to  native  language  speci- 
fication than  in  /b/  production  (see  Tables  3  and  13).  The  reason  for 
this  difference  may  relate  to  the  permissible  /b/  productions  in 
English  that  could  have  been  heard  by  the  L2  learner.   Lisker  and 
Abramson  (1964,  1967)  indicate  that  a  selection  of  the  native  English 
productions  of  /b/  in  their  study  fell  into  the  lead  category.   None 
of  the  English  speakers  in  the  present  study  produced  /b/  with  lead 
voicing.   Native  English  speakers  may  produce  a  /b/  with  continuous, 
lead,  0,  or  short  lag  VOT  (see  Chapter  2).  Therefore,  from  this  point 


-122- 

of  view,  the  targets  that  nonnative  speakers  perceive  as  accurate 
native  English  targets  may  vary  widely  in  onset  of  voicing. 

Lisker  and  Abramson  (1964)  also  mention  the  problem  of  continuous 
voicing  from  the  preceding  vowel  as  it  relates  to  /b/  production.   The 
term  "thru  voicing"  is  used  in  the  present  study  to  label  voicing 
tokens  that  are  continuous  from  the  preceding  vowel  to  the  burst  of 
the  stop  (see  Table  14). 

Thru  voicing.   For  rated  groups,  the  Superior  and  Accented  groups 
contained  a  similar  number  of  thru-voiced  /b/  productions  (see  Table 
15).  Native  English  speakers  in  this  study  did  not  thru  voice.  For 
all  of  the  native  Arabic  and  Thai  speakers  in  English,  instances  of 
thru  voicing  never  exceed  the  range  used  in  lead  voicing  (see  Table 
14).   For  all  of  the  native  speakers  of  Spanish,  only  2  of  5  lead 
voiced  /b/  productions  were  outside  the  lead  voicing  range  and  both 
were  produced  by  one  speaker.   (Incidentally,  only  one  Spanish  speaker 
used  lead  or  thru  voicing  in  the  present  study.   All  of  the  other 
Spanish  speakers  in  the  present  study  used  short  lag  voicing  to 
implement  /b/  VOT. ) 

It  could  be  the  case  that  tokens  of  thru  voicing,  such  as  those 
found  in  the  present  study,  are  examples  of  lead  voicing  that  happen 
to  begin  at  the  end  of  the  production  of  the  preceding  vowel.   If  this 
is  reasonable,  then  the  usual  practice,  followed  in  the  present  study, 
of  deleting  thru  voicing  examples  from  analysis  is  incorrect. 

Lisker  and  Abramson  (1967),  in  a  study  of  English  speakers,  found 
that  50%  of  the  /b,  d,  g/  productions  in  sentences  showed  unbroken 
voicing.  The  authors  mention  that  these  thru-voiced  productions 
should  be  classed  with  the  lead  productions  although  they  cannot  be  so 


-123- 
classified  because  of  the  measurement  criteria  used  by  the  authors. 
It  is  true  that  because  the  voicing  is  continuous  from  the  voicing  of 
the  preceding  vowel,  there  is  no  clear  beginning  to  the  lead  voicing 
and  so  it  is  difficult  to  measure  thru  voicing  decisively.   In  other 
words,  the  lead  voicing  may  have  been  intended  by  the  speaker  to  be 
even  longer  than  it  turned  out  to  be  but  the  necessities  of  running 
speech  compression  brought  the  vowel  too  close  and,  perhaps,  over- 
lapped consonant  voicing.   It  cannot  be  shown  unequivocally  that  the 
thru  voicing  does  not  overlap  the  voicing  for  the  vowel.   The  general 
convention  of  deleting  data  on  thru-voiced  productions,  however,  seems 
to  have,  in  effect,  defined  a  second  category  of  lead  voicing  that  is 
then  discarded.  Relevant  data  may  be  lost.  For  example,  in  the 
Lisker  and  Abramson  (1964)  cross-language  study,  for  Puerto  Rican 
Spanish,  only  one  /b/,  in  sentences,  occurred  with  unbroken  voicing. 
No  /t/  or  /d/  occurred  in  sentences  with  unbroken  voicing  for  Puerto 
Rican  Spanish  in  the  Lisker  and  Abramson  (1964)  study.   Since  thru- 
voiced  values  are  not  reported,  there  is  no  way  to  estimate  the  pos- 
sible range  of  lead  voicing  in  sentences  in  Puerto  Rican  Spanish.   If 
it  can  be  shown  in  future  research  that  the  majority  of  what  have  been 
called  thru-voiced  productions  do  not  exceed  the  range  for  lead  pro- 
ductions, then  thru-voiced  productions,  measured  from  the  cessation  of 
vowel  formant  energy,  should  be  included  in  the  lead  category  of 
voicing.   At  the  very  least,  thru  voicing  values  should  be  reported 
along  with  other  categories. 

Had  thru-voiced  tokens  been  included  as  part  of  the  lead  voicing 
category  in  the  present  study,  it  is  likely  that  significant  differ- 
ences would  have  been  found  between  the  groups.   This  is  especially 


-124- 
likely  because  the  native  English  speakers  in  this  study  did  not 
thru-voice  /b/  productions.   However,  lead  voicing  range  may  differ 
between  languages  as  has  been  seen  in  Chapter  2. 

VOT  for  individual  Superior  speakers.   Three  members  of  the 
Superior  group  were  most  often  identified  as  native  English  speakers. 
The  first  speaker  (#3)  received  only  native  English  speaker  ratings, 
the  second  speaker  (#7)  received  only  1  out  of  21  ratings  as  less  than 
native  English  accent,  and  the  third  speaker  (#5)  received  2  out  of  21 
less  than  native  ratings.   All  but  one  of  the  /b/  productions  by 
Speaker  #3  were  0-10  ms,  all  but  one  of  the  /b/  productions  by  Speaker 
#7  was  10  ms,  and  all  of  the  /b/  productions  by  Speaker  #5  were  0  or 
10  ms.  These  individual  values  are  similar  to  those  produced  by  the 
native  English  speakers  (see  Table  16) . 

On  the  other  hand,  mean  /p/  production  by  these  Superior  native 
Spanish  speakers  was  at  least  20  ms  shorter  than  /p/  produced  by 
native  English  speakers.   Values  for  /p/  were  intermediate  to  the 
values  specified  for  English  and  to  the  values  specified  by  Spanish  as 
was  predicted  by  Flege  and  Hillenbrand  (1984).   In  fact,  although 
production  of  /p/  was  not  significantly  different  for  the  rated 
groups,  most  of  the  nonnative  speakers  (except  for  the  native  Koreans) 
produced  shorter  mean  VOT  for  /p/  than  did  the  English  speakers  (see 
Table  14). 

Categorizations  of  VOT.   The  delineation  of  Voice  Onset  Time, 
although  this  parameter  is  the  most  thoroughly  researched  of  the 
duration  parameters  in  English,  may  need  further  clarification  for 
application  to  second  language  acquisition.  The  focus  of  VOT  research 
in  English  has  been  on  determining  the  centers  of  VOT  categories  for 


-125- 
particular  languages.  Categories  of  VOT  which  work  well  for  English 
and  some  other  languages  become  less  clear  when  cross-language  compar- 
isons are  made. 

The  three  Superior  native  Spanish  speakers  seem  to  have  made 
important  changes  in  their  VOT  specification  (see  Table  16).   These 
changes  may  be  explained  by  comparing  unlabeled  categories  for  Spanish 
and  English.  One  Spanish  category  (see  Table  3)  is  produced  with  lead 
voicing  (mean  VOT  of  -138  ms  in  isolated  words).   None  of  the  three 
speakers  used  lead  voicing.   Instead,  they  produced  VOT  values  for  /b/ 
that  resemble  the  Spanish  /p/  category  (mean  value  of  +4  ms)  and  that 
are  intermediate  to  the  Spanish  and  English  /p/  category. 

It  might  be  speculated  that  the  Superior  Spanish  speakers'  accu- 
racy in  producing  a  short  lag  /b/  resulted  from  the  availability  of 
the  short  lag  category  in  Spanish.   In  other  words,  all  /b/  words  in 
English  could  be  pronounced  as  if  they  began  with  Spanish  /p/.   This 
line  of  argument  would  imply  that  the  Superior  Spanish  speakers  also 
learned  a  "new"  sound  (for  English  /p/)  and  attempted  to  form  a  new 
category  for  that  sound,  although  they  were  not  completely  successful. 
The  claim  presented  by  Flege  and  Hillenbrand  (1984),  which  is  being 
tested  in  the  present  study,  would  imply  that  /p/  is  equated  with  /p/ 
and  /b/  is  equated  with  /b/  in  L2  acquisition.   This  may  be  inaccurate 
if  nonnative  speakers  are  equating  categories  of  VOT  rather  than 
labels  for  VOT  categories. 

Flege  and  Hillenbrand  (1984)  argued  for  limits  on  accuracy  based 
on  prototypes.   Similar  sounds  for  which  prototypes  exist  should  be 
unauthentic  while  new  sounds  should  be  produced  more  authentically. 
The  difficulty  of  determining  what  is  "new"  and  what  is  "similar"  was 


-126- 

discussed  in  Chapter  1 .  In  the  case  of  the  three  Superior  native 
Spanish  speakers,  prototype  categories  existed  in  Spanish.  It  might 
be  easier  to  make  use  of  an  existing  prototype  category  by  some  sort 
of  re-labeling,  as  may  have  happened  in  this  case,  than  to  develop  a 
new  prototype  category  of  VOT  that  is  similar  to,  but  does  not  match 
the  LI  prototype. 

If  we  examine  the  possibility  that  speakers  can  equate  categories, 
regardless  of  label,  explanation  of  patterns  of  acquisition  may  become 
clearer.   For  example,  it  is  suggested  here  that  Spanish  speakers  do 
not  attempt  to  equate  Spanish  /b/  with  English  /b/  resulting  in  a 
compromise  L2  /b/.   Rather,  the  superior  native  Spanish  speakers 
equated  Spanish  /p/  with  English  /b/,  resulting  in  what  is  likely  to 
be  an  authentic  English  /b/.   Only  one  native  Spanish  speaker  (rated 
Accented)  produced  /b/  VOT  values  outside  the  Spanish  /p/  range  and 
with  lead  VOT  values.  This  finding  also  seemed  to  hold  for  Superior 
speakers  from  other  language  groups. 

Two  native  Arabic  speakers  were  rated  Superior  speakers  of  English 
as  a  second  language,  although  they  were  less  highly  rated  on  per- 
ceived accent  than  the  three  native  Spanish  speakers.   If  the  argument 
presented  above  is  valid,  is  there  additional  evidence  from  the  pro- 
ductions of  the  native  Arabic  speakers  in  the  present  study  that 
nonnative  speakers  are  equating  categories  of  VOT  rather  than  labels? 

The  mean  /b/  for  these  two  native  Arabic  speakers  (see  Table  16) 
is  comparable  to  the  hypothetical  /p/  (+14  ms)  postulated  by  Flege 
(1979)  as  the  /p/  that  would  exist  if  Arabic  had  a  /p/.   (In  order  to 
derive  the  hypothetical  Arabic  /p/,  Flege  extrapolated  from  the  Arabic 
/t ,  d/.)  Therefore,  it  might  be  speculated  that  the  Superior  Arabic 


-127- 

speakers  have  made  use  of  a  category  from  their  native  language, 
re-labeling  it  to  serve  in  L2 . 

The  last  two  speakers  in  the  Superior  group  are  native  speakers  of 
Korean  and  Thai  (see  Table  16).   For  Speaker  #12,  the  first  VOT  cate- 
gory resembles  the  second  Korean  VOT  category  (mean  18  ms  in  isolated 
words) .  This  category  is  often  thru-voiced  in  Korean  although  Speaker 
#12  does  not  thru  voice  in  English.   One  piece  of  evidence  that  argues 
against  the  interpretation  that  the  English  /b/  and  the  second  Korean 
categories  are  being  equated  is  that  21/30  (67%)  of  the  /p/  produc- 
tions by  all  the  native  speakers  of  Korean  showed  low  frequency 
energy,  which  was  not  voicing  energy,  near  the  area  where  voicing  is 
usually  found  on  spectrograms.   In  Chapter  2  of  the  present  study, 
Lisker  and  Abramson's  (1964)  discussion  of  the  two  short  lag  Korean 
phones  was  reviewed.  The  low  frequency  energy  found  in  the  present 
study  seems  to  be  the  "glottalization"  discussed  by  Lisker  and  Abram- 
son.   If  this  is  true,  then  the  Korean  speakers  are,  for  the  most 
part,  re-labeling  the  first  Korean  category  (which,  according  to 
Lisker  and  Abramson,  1964,  has  glottalization)  in  speaking  English. 
The  mean  VOT  reported  for  the  first  Korean  category  is  +5  ms  (see 
Table  3).   Mean  VOT  found  here  is  slightly  longer.   Speaker  #12  glot- 
talized  all  /p/  productions,  which  indicates  that  the  first  Korean  VOT 
category  was  being  re-labeled.  The  second  VOT  category  used  by  the 
native  Korean  speaker  in  English  seems  to  resemble  the  third  Korean 
VOT  category  (mean  91  ms). 

Speaker  #20,  whose  native  language  is  Thai,  seems  have  re-labeled 
the  lead  voiced  Thai  category  for  English  (see  Tables  3  and  16). 
Remember  that  lead  voicing  is  permissible  in  English.   This  speaker 


-128- 
also  has  only  one  token  of  /b/  that  is  not  thru-voiced.   Lisker  and 
Abramson  (1964)  found  instances  of  thru  voicing  in  Thai  native  lan- 
guage productions  of  /b/.   The  second  VOT  category  used  by  speaker  #20 
in  English  is  more  difficult  to  relate  to  LI ,  although  there  is  other 
evidence  that  is  relevant  here. 

Gandour  (1974)  suggests  that  Thai  has  three  positive  VOT  phones  as 
well  as  one  lead  voiced  phone.  The  voiceless  stops  include  a  voice- 
less unaspirated,  a  voiceless  aspirated,  and  a  voiceless  breathy 
aspirated  stop.   Values  for  VOT  are  not  given  by  Gandour.   For  all  the 
Thai  speakers  in  the  present  study,  regardless  of  rating,  19  (63%)  out 
of  30  /p/  productions  showed  heavy  low  frequency  energy  starting  at 
approximately  the  level  of  the  first  vowel  formant.   Another  4  out  of 
30  showed  faint  but  discernable  energy  in  the  same  area.   The  heavy 
aspiration  occurred  on  /p/  productions  ranging  from  30-110  ms  in  VOT. 
Therefore,  it  seems  likely  that  the  Thai  speakers  were  re-labeling  the 
category  used  in  Thai  that  Gandour  (1974)  called  voiceless,  breathy, 
and  aspirated  for  many  if  not  all  of  the  English  /p/  productions. 
Speaker  #20  showed  heavy  aspiration  on  3  out  of  5  of  the  /p/  produc- 
tions. 

If  the  implications,  based  on  a  limited  sample  of  superior  bilin- 
guals'  VOT  productions,  are  born  out  by  future  research,  a  clarifica- 
tion of  the  concept  of  new  versus  similar  sounds  as  they  relate  to  the 
claim  made  by  Flege  and  Hillenbrand  (1984)  is  possible.   For  a  set  of 
sounds  (such  as  stop  consonants)  where  onset  of  voicing  is  an  impor- 
tant parameter  in  production,  accurate  prediction  of  second  language 
acquisition  may  be  related  to  available  categories  of  VOT  in  use  in 
the  native  language.   If  a  category  is  available  both  in  LI  and  L2, 


-129- 
superior  bilinguals  will  re-label  (if  necessary)  the  LI  category  for 
use  in  L2 .  The  resulting  L2  productions  may  or  may  not  differ  from  the 
norms  for  L2 . 

Finally,  determination  of  VOT  categories  is  complicated  by  the 
range  of  categories  and  the  values  within  the  categories  which  have 
been  found  to  be  acceptable  within  native  English.  While  studies  of 
the  type  performed  here  are  generally  based  on  mean  values,  it  is  not 
clear  that  this  accurately  reflects  production  data.   Flege  and  Brown 
(1982)  demonstrated  that  laryngeal  activity  in  English  for  initial 
stop  production  can  vary  from  lead  onset  of  closure  to  short  lag  onset 
of  closure  of  the  glottis  and  still  produce  an  acoustic  record  of 
short  lag  voice  onset.  Also,  in  English,  Lisker  and  Abramson  (1964, 
1967)  found  that  /b/  is  produced  in  English  with  thru,  lead,  simulta- 
neous, and  short  lag  VOT.   It  is  difficult  to  determine  what  targets 
are  the  norms  for  English.  Of  course,  comparisons  are  made  within  the 
limits  of  the  context  of  a  particular  study  but,  since  none  of  the 
English  speakers  in  the  present  study  produced  VOT  for  /b/  in  the  lead 
range,  it  is  difficult  to  decide  if  they  did  so  because  of  the  nature 
of  English  or  because,  by  chance,  no  speakers  who  implemented  VOT  with 
lead  voicing  were  included  in  the  present  study.   Other  studies 
(reviewed  in  Chapter  2)  strongly  suggest  that  the  second  possibility 
is  the  case.   It  is  suggested  here  that  future  research  should  be 
directed  to  determining  range  restrictions  in  productions  for  duration 
variables  in  English  such  as  VOT  and  the  other  variables  examined  in 
this  study. 


-130- 
Final  Consonant  Closure 

In  this  study,  Superior  speakers  were  not  different  from  native 
English  speakers  on  the  length  of  their  /t/,  the  length  of  their  /d/, 
or  on  the  relative  duration  of  the  voiced  and  voiceless  consonants. 
This  finding  seems  to  contradict  the  limit  hypothesis  proposed  by 
Flege  and  Hillenbrand  (1984). 

The  comparisons  of  all  the  nonnative  speakers  and  the  native 
English  speakers  (see  Table  20)  indicated  that  some  of  the  nonnative 
speakers  differed  from  the  English  speakers  on  the  length  of  both  of 
the  final  stop  consonant  closure  durations.   There  were  no  differences 
between  any  of  the  native  and  nonnative  speaker  groups  on  the  /t/  to 
/d/  ratio. 

Most  importantly,  overall  mean  closure  duration,  in  this  study, 
increased  directly  with  increase  in  degree  of  perceived  accent  (see 
Table  20).   In  addition,  the  closer  the  rating  of  the  nonnative 
speaker  group  to  English,  the  more  options  have  been  implemented  for 
the  release  of  final  consonants.  Contributing  factors  will  be  dis- 
cussed in  the  subsections  that  follow. 

As  expected,  the  native  English  speakers  generally  produced  a  mean 
final  /t/  longer  than  their  mean  final  /d/  (see  Table  20).   Most  of 
the  nonnative  speakers  also  showed  a  longer  /t/  than  /d/.   The  overall 
mean  closure  duration  for  the  stops  as  well  as  the  mean  closure  dura- 
tion for  the  individual  sounds  was  longer  for  nonnative  speakers  in 
English  than  for  native  English  speakers.   The  English  speakers  also 
had  smaller  variances  in  their  means  than  did  the  nonnative  speakers. 
Part,  but  not  all,  of  the  differences  in  length  can  be  attributed  to 


-131- 
the  tendency  of  the  American  English  speakers  to  flap  /t,  d/  in  inter- 
vocalic position.   (Unreleased  final  stops  were  deleted  from  the  com- 
parisons and  will  be  discussed  separately.) 

Influences  that  can  be  related  to  native  language.   The  increase 
in  mean  closure  duration  related  to  strength  of  accent  does  not  seem 
to  be  related  to  the  language  composition  of  the  groups.   The  results 
for  Arabic  are  similar  to  findings  from  other  studies  (Port  et  al., 
1980;  Port  &  Flege,  1981)  in  showing  no  differences  in  closure  dura- 
tion for  the  voiced  and  voiceless  stops  (see  Table  17).  The  other 
languages  of  concern  here  do  not  have  a  final  voiced/voiceless  con- 
trast.  Korean  requires  final  /t/  (although  both  voiced  and  voiceless 
consonants  are  allowed  intervocalically) ,  Thai  requires  final  /d/ 
(although  voiceless),  and  Spanish  requires  a  voiced  fricative  in  final 
position  (see  Table  4).   Native  speakers  of  Korean  and  Thai  may  have 
adjusted  to  differentiating  final  stops  by  closure  duration  in 
English.   It  was  expected  that  native  Spanish  speakers  would  learn  to 
use  a  stop  instead  of  a  fricative  in  final  position  and  they  did. 

Contrary  to  the  data  presented  as  mean  scores,  most  of  the  speak- 
ers from  each  language  group  were  differentiating  /t/  and  /d/  by  at 
least  7  ms  (see  Table  18).  The  speakers  did  seem  to  be  producing  some 
difference  between  /t/  and  /d/  although  the  difference  could  have  been 
in  favor  of  either  /t/  or  /d/.   When  these  differences  are  in  opposite 
directions,  they  may  cancel  each  other  out,  resulting  in  a  finding  of 
no  difference  between  /t/  and  /d/. 

The  finding  that  native  English  speakers  produce  /t/  and  /d/  in 
different  ways  does  not  contradict  other  studies.   Sharf  (1962),  in  a 
study  of  bisyllabic  words,  found  that  intervocalic  /t/  and  /d/  were 


-132- 

produced  with  a  40  ms  difference  and  that  /d/  was  the  longer  of  the 
two  consonants. 

The  most  extreme  example  of  individual  variation ,  in  the  present 
study,  comes  from  the  native  Spanish  speakers,  who  produced  two 
/t/=/d/,  two  /t/>/d/,  and  two  /t/</d/  (see  Table  18).  Therefore, 
although  the  /t,  d/  contrasts  do  not  exist  in  final  position  in  the 
language  (Korean,  Spanish,  Thai)  or  are  not  made  in  that  position  in 
the  language  (Arabic),  most  of  the  speakers  are  differentiating  /t/ 
and  /d/  based  on  length  in  English  by  at  least  as  much  as  the  native 
English  speakers.  It  should  be  noted,  however,  that  this  difference 
was  only  16  ms  for  native  English  speakers  in  this  study. 

This  finding  has  some  implications  for  studies  which  measure  final 
consonant  closure  duration.   For  example,  for  the  Arabic  speakers  in 
the  present  study,  the  mean  /t/  was  100  ms  and  the  mean  /d/  was  105  ms 
(see  Table  17).   This  implies  that  the  Arabic  speakers  are  not  differ- 
entiating the  two  consonants.   However,  two  of  the  Arabic  speakers 
were  producing  a  mean  /t/  longer  than  their  mean  /d/  while  three 
speakers  were  producing  a  shorter  /t/  than  /d/  (see  Table  18).   The 
mean  absolute  difference  produced  by  the  native  Arabic  speakers  was  24 
ms.  If  positive  and  negative  values  are  added,  the  mean  difference 
between  the  two  consonants  for  Arabic  speakers  would  have  been  given 
as  5  ms,  rather  than  the  absolute  difference  of  24  ms.  The  advanced 
Arabic  speakers  in  Flege's  (1979)  study,  unlike  the  native  Arabic 
speakers  in  the  present  study,  did  significantly  differentiate  /t/  and 
/d/.  The  less  advanced  Arabic  speakers  did  not  differentiate  the  two 
consonants.  It  is  not  clear  whether  the  values  in  the  Flege  (1979) 


-133- 
study  or  the  values  in  other  studies  were  based  on  absolute  differ- 
ences between  the  /t/  and  /d/. 

Examination  of  absolute  differences  for  ft/   and  /d/  for  the  rated 
groups  (see  Table  21)  indicates  that  differentiation  of  the  two  conso- 
nants increases  as  strength  of  rating  of  accent  increases.  The  Supe- 
rior group  is  similar  to  the  native  English  speakers,  while  the 
Accented  group  differentiates  the  consonants  by  twice  as  much  as  do 
the  Superior  speakers  and  the  native  English  speakers.   Superior 
speakers  seem  to  have  shortened  their  final  consonant  closure  duration 
to  match  the  norms  of  the  native  English  speakers  (see  Table  20). 

Flapping .  The  advanced  speakers  in  Flege's  (1979)  study  flapped 
only  2  out  of  216  final  alveolar  stops.   It  may  be  that  the  advanced 
Arabic  speakers  in  Flege's  study  (who  were  advanced  only  because  they 
had  been  in  this  country  longer  than  the  other  groups  of  Arabic  speak- 
ers) flapped  less  because  they  were  attempting  to  speak  more  formally 
in  English.  The  findings  of  the  present  study  are  contrary  to  the 
findings  in  Mitleb  (1981),  where  the  English  speakers  were  signifi- 
cantly different  in  the  duration  of  /t/  and  /d/.   But  Mitleb 's  (1981) 
results  may  have  been  influenced  by  data  that  indicate  that  80%  of  the 
English  /d/s  and  56%  of  the  /t/s  were  flapped. 

The  native  English  speakers'  differentiated  unflapped  /t/  and  /d/ 
by  only  16  ms  which  is  consistent  with  other  findings  for  English  (see 
Table  21).   One  of  the  native  Spanish  speakers,  one  of  the  native 
Arabic  speakers  and  the  native  Thai  speaker  also  differentiated 
unflapped  /t/  and  /d/  by  only  a  small  amount,  although  the  difference 
for  the  native  Spanish  speaker  (#5)  and  the  native  Thai  speaker  (#20) 
result  from  a  mean  /d/  that  is  longer  than  the  mean  /t/  so  that  the 


-134- 
difference  value  is  negative  (see  Table  24).  For  the  native  Spanish 
speaker  (#5)  this  result  is  based  on  only  three  productions,  while  all 
of  the  productions  by  the  native  Thai  speaker  (10)  are  included  in 
that  speaker's  mean. 

Only  the  native  Spanish  speakers  flapped  as  much  as  the  native 
American  English  speakers  (see  Table  22).  The  English  speakers 
flapped  73%  of  their  final  /d/s  and  36%  of  their  final  /t/s.  Native 
speakers  of  Arabic,  all  of  whom  had  British  instructors  in  their  early 
English  exposure,  flapped  the  least.   This  is  reasonable  since  British 
English  does  not  flap  across  word  boundaries.   Flege  (1979)  found  that 
none  of  his  Arabic  speakers  flapped,  while  Mitleb  (1981)  found  that 
many  of  his  native  Arabic  speakers  flapped.   This  optional  flapping 
rule  in  English  may  be  acquired  by  the  more  experienced  speakers  but 
acquisition  may  also  depend  on  how  American  the  nonnative  speakers 
want  to  sound. 

One  factor  influencing  the  amount  of  flapping  is  differential  use 
of  flaps  in  different  speaking  situations.  Zue  and  Laferriere  (1979) 
found  a  difference  in  length  of  medial  /t,  d/  production  for  men  and 
women.   The  women  tended  to  produce  longer  medial  alveolar  stops  in  a 
reading  task  and  even  aspirated  medial  /t/.   Zue  and  Laferriere  (1979) 
suggest  that  the  women  are  reacting  to  the  situation  of  careful  speech 
production  (a  reading  task  in  a  laboratory)  by  articulating  full 
hypercorrect  forms  (Labov,  1966).  The  women,  but  not  the  men,  are 
using  more  formal  allophones  of  the  sounds  that  are  specific  to  the 
formal  speech  situation. 

Nonnative  speakers,  even  those  who  have  acquired  the  optional 
flapping  rule,  may  also  produce  hypercorrect  forms  in  a  laboratory 


-135- 
situation.  Therefore,  the  normative  speakers'  productions  in  this 
situation  may  be  a  reflection  of  their  abilities  in  English  or  it  may 
reflect  their  perception  of  the  speech  situation.  One  difference 
between  the  studies  by  Flege  (1979)  and  Mitleb  (1981)  (where  one  study 
had  tokens  of  flapped  sounds  and  the  other  did  not)  was  that  Mitleb  is 
a  native  speaker  of  Arabic  and  Flege  is  not.  This  difference,  alone, 
may  have  determined  the  formality  of  the  speaking  situation. 

In  the  present  study,  efforts  were  made  to  define  the  speaking 
situation  so  that  less  than  the  strictest  formal  speech  would  be 
elicited.  The  fact  that  flapped  tokens  were  found  may  indicate  that 
this  goal  was  achieved,  at  least  to  some  degree. 

Final  consonant  closure  options.   It  may  be  that  there  are  three 
ways  for  nonnative  speakers  to  adjust  final  stop  closure  duration  in 
meeting  English  phonetic  norms.  The  speaker  may  flap  /t/  and  /d/ 
often  enough  to  reduce  the  overall  closure  duration,  or  may  shorten 
the  closure  duration,  or  both.   (Nonrelease  of  final  consonants  is 
another  option  that  will  be  discussed  separately.) 

Sociolinguistic  research  has  indicated  that  an  optional  rule  may 
be  applied  proportionally  in  speech  to  achieve  an  overall  effect.  For 
example,  Labov  (1972)  found  that  as  speaking  situation  changed,  the 
individual  proportion  of  velar  /^/   to  /n/  in  verb  endings  changed.   In 
casual  situations,  a  higher  proportion  of  velar  /n/  productions  are 
changed  to  /n/.   (This  casual  pattern  is  reflected  in  such  written 
productions  as  "I  was  walkin'  along".)   In  more  formal  situations,  the 
ending  tends  to  be  produced.  The  speech  situation  cannot  be  judged 
from  one  or  two  tokens  but  must  be  looked  at  in  terms  of  the  propor- 
tion of  each  ending  over  time. 


-136- 

It  may  be  speculated  that  one  explanation  for  the  performance  of 
the  Superior  speakers  can  be  based  on  the  proportional  strategy 
described  above.   In  addition,  the  sociolinguistic  notion  of  hypercor- 
rection,  discussed  above,  may  be  seen  to  be  applied,  perhaps,  in  the 
productions  of  Speaker  #7  who  produced  all  /t/  and  /d/  tokens  between 
5  and  35  ms  in  length,  with  half  the  /t/  and  /d/  closures  being  only  5 
ms  long.   This  speaker  may  have  overdone  the  shortening  of  the  closure 
duration  in  an  attempt  to  adhere  to  the  English  norms.   A  more  tradi- 
tional argument  would  be  that  the  results  obtained  in  this  study  are 
due  to  speakers  being  in  different  stages  of  mastery  of  final  alveolar 
stop  closures  rather  than  that  the  results  are  due  to  what  would  be  a 
sophisticated  timing  pattern. 

Closure  duration  data  for  the  speaker  groups  appears  to  provide 
clear  evidence  that  suggests  a  pattern  related  to  acquisition  of  L2 
duration  parameters  as  the  parameters  relate  to  perceived  strength  of 
accent.   However,  even  Superior  speakers  produce  the  final  stops  with 
larger  closure  times  than  do  the  native  English  speakers  (20  ms  longer 
including  flaps,  37  ms  longer  excluding  flaps),  although  the  differ- 
ence was  not  statistically  significant  when  the  rated  groups  were 
contrasted  (see  Tables  20  and  23). 

The  three  most  highly  rated  speakers  produced  an  overall  stop 
closure  duration  close  to  the  unflapped  overall  stop  closure  duration 
used  by  the  native  English  speakers  (see  Table  24).   For  two  of  the 
three  speakers,  however,  the  unflapped  closure  duration  was  about  as 
long  as  the  unflapped  closure  duration  of  the  least  highly  rated 
speakers.   The  second  and  third  rated  speaker  often  flapped  /t/  and 
/d/  although  the  most  highly  rated  speaker  flapped  less  often. 


-137- 

These  findings  suggest  a  possible  interaction  between  flapping  and 
overall  stop  closure  duration.  Speakers  #3  and  #5  may  have  flapped 
just  enough  to  reduce  their  mean  stop  closure  duration  to  a  value 
close  to  the  mean  for  native  English  speakers.   Speaker  #7  flapped 
every  /t/  and  /d/  and  produced  a  stop  closure  duration  much  smaller 
than  any  of  the  native  English  speakers.   These  three  speakers  were 
native  Spanish  speakers. 

The  Arabic  speakers  did  not  flap ,  but  the  more  highly  rated 
speaker  (#32)  had  a  shorter  overall  consonant  closure  duration  than 
did  the  less  highly  rated  speaker  (#25).   The  Thai  speaker  did  not 
flap  but,  perhaps,  did  not  choose  to  because  this  speaker's  closure 
duration  averaged  76  ms,  which  approximates  the  English  norm.   The 
Korean  speaker  also  came  within  16  ms  of  the  native  English  speakers 
but  chose  to  flap  most  /t ,  d/  productions. 

It  might  be  the  case  that  excellent  nonnative  speakers  are  sensi- 
tive, to  some  extent,  to  some  of  the  rules  governing  casual  speech  and 
formal  speech  as  reflected  in  the  timing  of  final  stop  consonant 
release.   It  is  likely  that  second  language  learners  hear  unf lapped, 
or  even  aspirated,  final  stops  in  the  classroom,  but  hear  flapped  and 
unreleased  final  stop  models  outside  the  classroom.   The  strong  claim 
could  be  made  that  superior  bilinguals  have  acquired  a  variety  of 
timing  patterns  related  to  the  speech  situation.   A  weaker  version  of 
this  claim  might  be  that  the  L2  speakers  will  choose  to  acquire  tar- 
gets heard  outside  the  classroom  and  would,  therefore,  have  a  model  of 
mostly  short  (flapped)  final  alveolar  stops. 

Unreleased  final  stops.   Another  option  for  release  of  final 
consonants  in  English  is  to  hold  the  closure  for  the  consonant  until 


-138- 

the  onset  of  a  following  vowel.  These  unreleased  final  consonants  are 
usually  deleted  from  analysis.   As  was  pointed  out  in  the  section  on 
thru  voicing,  it  may  be  that  a  category  of  final  consonant  has  been 
defined  and  then  discarded  when  release  values  are  not  reported.  No 
published  data  are  available  on  the  length  of  unreleased  consonants 
across  word  boundaries.  Therefore,  it  is  not  clear  whether  there  is  a 
limit  to  the  permissible  duration  of  the  closure  for  a  final  stop 
consonant  in  English  or  whether  other  languages  permit  unreleased 
final  stops. 

The  native  English  speakers  in  the  present  study  always  released 
final  /d/  but  did  not  always  release  final  /t/  (see  Table  27).   How- 
ever, final  /d/  was  flapped  more  often  by  the  native  English  speakers 
in  this  study  than  was  final  /t/  (see  Table  22),  which  was  also  the 
finding  reported  by  other  studies  (Flege,  1979;  Mitleb,  1981).  Flege 
reported  that  25%  of  the  native  English  speakers'  final  stops  were 
unreleased,  which  is  similar  to  the  results  found  in  the  present 
study.  Native  Arabic  speakers  in  Flege's  study  produced  11%  unreleased 
final  stops.  Only  3%  of  the  Arabic  productions  in  the  present  study 
were  unreleased  (see  Table  27).  Native  Korean  speakers  were  the  only 
other  group  to  produce  many  unreleased  final  consonants.   Perhaps 
learning  authentic  English  requires  mastery  of  several  duration 
options  for  final  stops.  Segmental  explanations  for  accent  would  have 
difficulty  dealing  with  a  variety  of  release  options.   (A  step  in 
deciding  the  value  of  this  kind  of  information  requires  that  unre- 
leased final  stops  be  documented.) 

One  observation  that  can  be  made  is  that  the  ways  in  which  final 
alveolar  stops  are  released  seems  to  have  influenced  rating  of  accent 


-139- 
strength  (see  Table  28).  The  Accented  group  uses  few  of  the  available 
release  options.  This  group  produced  only  6  out  of  100  stops  that  was 
either  unreleased  or  flapped.  The  Very  Good  group  produced  a  quarter 
of  their  stops  as  flapped  or  unreleased.  The  Superior  group  produced 
43%  unreleased  or  flapped  stops.   The  native  English  speakers  produced 
60%  of  their  stops  as  unreleased  or  flapped.   This  seems  to  indicate 
that  nonnative  speakers  need  to  acquire  several  durations  specific  to 
alveolar  stops,  rather  than  only  one,  in  order  to  achieve  authentic 
production  of  English  final  stops. 

Problems  with  range  and  acceptable  values  have  already  been 
referred  to  earlier.   Must  a  speaker  reproduce  a  specific  duration,  or 
is  it,  instead,  a  proportional  relationship  that  is  required?   It  is 
suggested  here  that  further  research  be  directed  at  the  possibility  of 
a  proportional  relationship. 

The  importance  of  this  type  of  research  also  has  wider  implica- 
tions. The  norm  for  acquisition  in  English  requires  further  deline- 
ation, especially  from  the  point  of  view  of  situational  variations  in 
timing.   Acquisition  of  a  second  language  is  not  necessarily  corre- 
lated with  acquisition  of  citation  form  productions.   Because  of  this, 
norms  developed  in  the  phonetic  study  of  English  may  not  be  relevant 
in  the  phonetic  study  of  second  language  acquisition. 

The  findings  in  the  present  study  seem  to  indicate,  first,  that 
final  consonant  closure  duration  increases  as  distance  from  native 
English  increased.  This  finding  suggests  a  purely  durational  factor. 
However,  additional  evidence  has  been  offered  that  seems  to  point  to 
the  ability  to  implement  a  number  of  durationally  different  release 
options  as  also  being  related  to  strength  of  accent.   Natural  sounding 


-140- 
American  English  reflects  the  use  of  short  closure  durations  and  also, 
perhaps,  the  use  of  unreleased  and  flapped  final  closures.  Of  course, 
this  proposal  is  specific  only  to  /t/  and  /d/  since  other  final  stops 
are  not  flapped.   However,  unreleased  final  /p,  b,  k,  g/  are  also 
common  in  American  English.   It  can  be  argued  that  too  high  a  propor- 
tion of  released  stops  sounds  artificial  and  overly  precise  in  Ameri- 
can English. 

Vowel  Duration 

The  Superior  speakers,  as  a  group,  were  differentiating  /i/  and 
/I/  and  they  were  not  different  from  the  native  English  speakers  on 
vowel  length  according  to  the  statistical  analysis.   This  result  seems 
to  contradict  the  limit  hypothesis  proposed  by  Flege  and  Hillenbrand 
(1984). 

Comparisons  made  with  all  of  the  nonnative  speakers  indicated  that 
all  of  the  nonnative  speakers  were  similar  to  the  native  English 
speakers  for  the  length  of  the  /i/  but  not  for  the  length  of  the  /I/ 
(see  Table  30).   All  of  the  groups,  except  the  Very  Good  group,  were 
differentiating  the  two  vowels  by  length.   The  Superior  speakers  were 
different  from  at  least  some  of  the  less  highly  rated  nonnative  speak- 
ers on  the  relative  length  of  the  two  vowels,  although  they  were  not 
different  from  the  native  English  speakers  on  that  ratio. 

One  problem  that  occurs  with  close  consideration  of  vowel  duration 
is  the  interaction  of  factors  such  as  phonemic  vowel  length,  presence 
or  absence  of  a  sound  in  the  phonology,  and  status  of  the  sound  as 
phonemic  or  allophonic.   Phonemic  vowel  difference  is  about  50% 
between  long  and  short  versions  of  the  same  vowel ,  and  English 


-141- 
tense/lax  ratio  is  much  smaller.   Few  of  the  nonnative  speakers  in  the 
study  produced  a  ratio  of  /I/  to  /i/  near  to  the  phonemic  vowel  dif- 
ference ratio. 

Influences  that  can  be  related  to  native  language.  One  general- 
ization often  made  is  that  vowel  duration  for  nonnative  speakers  in 
conversational  English  seems  too  short,  especially  for  native  speakers 
of  Asian  languages.   However,  as  can  be  seen  here,  Korean  and  Thai 
speakers  had  longer  mean  vowel  durations  in  English  than  did  the 
native  English  speakers  (see  Table  29).   It  is  possible  that  the  vowel 
durations  in  the  carrier  sentences  were  longer  than  they  would  have 
been  in  a  more  conversational  sample. 

Native  Spanish  speakers'  average  vowel  durations  were  close  to  the 
English  mean  while  the  native  Arabic  speakers'  average  vowel  durations 
were  shorter  than  those  found  for  English  by  14  ms  (see  Table  29). 
The  native  Arabic  speakers  in  this  study  were  much  closer  in  average 
vowel  duration  to  the  native  English  speakers  than  native  Arabic 
speakers  were  in  Flege's  (1979)  study,  which  found  that  average  vowel 
duration  for  advanced  native  Arabic  speakers  averaged  33  ms  shorter 
than  the  average  vowel  duration  for  the  native  English  speakers.   The 
native  Arabic  speakers  in  Flege's  study  (1979)  produced  vowels  that 
were  midway  between  the  vowel  durations  for  the  average  Arabic  long 
vowel  (179  ms)  and  the  average  Arabic  short  vowel  (98  ms).  The 
results  of  the  present  study  are  the  same.   The  vowel  durations  for 
native  Arabic  speakers  are  midway  between  Arabic  long  and  short  vowel 
durations.  However,  native  English  speakers  produced  shorter  vowels  in 
the  present  study  than  they  did  in  Flege's  (1979)  study. 


-142- 
Korean  and  Thai  both  have  allophonic  variations  of  /i/  vowels  that 
are  realized  as  /I/  (see  Table  6).  Arabic  and  Spanish  do  not.  The 
native  speakers  of  Spanish,  however,  seem  to  be  the  nonnative  speakers 
who  least  differentiate  /i/  and  /I/  in  speaking  English.   It  may  be 
that  accuracy  in  vowel  duration  in  L2  is  related  both  to  the  occur- 
rence of  phonemic  vowel  length  in  LI  and  to  the  occurrence  or  nonoc- 
currence of  the  sound  in  LI  as  well  as  to  the  length  of  the  sound  in 
LI. 

Arabic,  Korean,  and  Thai  all  have  phonemic  vowel  length  and  Span- 
ish does  not  (see  Table  6).  Native  speakers  of  Spanish,  therefore, 
have  neither  the  /I/  sound,  nor  the  parameter  of  duration  (in  terms  of 
phonemic  vowel  length)  to  apply  to  differentiating  vowels  when  they 
learn  to  speak  English.  The  native  Spanish  speakers,  as  a  group,  seem 
to  produce  /I/  and  /i/  with  approximately  the  same  length  in  speaking 
English. 

Contrary  to  the  findings  by  Mitleb  (1984a)  and  Flege  (1979),  the 
native  Arabic  speakers  in  this  study  did  not  produce  an  /I-i/  vowel 
duration  difference  of  close  to  50%  in  English.   They  seemed  to  pro- 
duce the  /I/  and  /i/  with  different  durations,  both  of  which  were 
shorter  than  the  length  of  the  corresponding  vowels  produced  by  native 
English  speakers.  Comparison  data  for  native  speakers  of  the  other 
languages  of  concern  here  are  not  available.   However,  the  Korean  and 
Thai  native  speakers,  who  also  have  phonemic  vowel  length,  do  not  seem 
to  have  used  the  0.5  long/short  vowel  ratio  common  for  differentiating 
vowels  phonemically  (see  Table  29) .   Although  the  overall  vowel  dura- 
tion for  native  Korean  speakers  and  native  Thai  speakers  was  similar, 
their  /I-i/  ratios  seemed  to  be  different.   Native  Thai  speakers 


-143- 
seemed  to  produce  a  larger  difference  between  /I/  and  /i/  (37  ms)  than 
did  the  Koreans  (15  ms). 

The  dissimilarity  between  the  language  groups  suggests  that  the 
speakers  of  different  languages  may  be  required  to  make  different 
types  of  adjustments  in  vowel  length  when  speaking  English.   Speakers 
of  languages  with  phonemic  vowel  length  did  not  seem  to  behave  alike, 
nor  do  speakers  of  languages  that  do  not  have  one  of  the  two  vowels  in 
the  phonetic  inventory  of  the  language. 

Vowel  duration  for  Superior  speakers.  Superior  speakers,  as  a 
group,  seem  to  have  developed  an  /I/  that  is  only  a  little  shorter 
than  the  English  /I/  (see  Table  30).  The  finding  for  /i/  suggests 
that  this  sound  may  have  been  affected  by  LI.   Remember  that  none  of 
the  non-English  languages  have  an  /I/  phoneme  (although  the  phone  does 
occur  as  an  allophone  in  Korean  and  Thai),  but  all  of  the  languages 
have  an  /i/.  Generally,  the  difference  between  /I/  and  /i/  is  less  for 
the  speakers  in  the  Superior  group  than  for  the  native  English  speak- 
ers. The  Accented  group  had  the  largest  difference  (31  ms)  between 
HI   and  /i/.  The  difference  for  the  native  English  speakers  was  20  ms 
and  the  other  two  rated  speaker  groups  had  a  10-12  ms  difference 
between  the  two  vowels.   The  results  of  the  statistical  analysis  seem 
to  have  been  influenced  by  the  different  native  speakers  combined  in 
the  rated  groups. 

The  Very  Good  group  is  heavily  loaded  with  native  Korean  speakers 
(4/7).  Two  of  the  other  three  speakers  in  this  group  are  native 
Spanish  speakers.  The  Spanish  and  Korean  language  groups  were  those 
who  least  differentiated  the  two  vowels.   This  presumably  accounts  for 


-144- 

the  statistical  finding  that  the  Very  Good  group  is  different  from  the 
other  nonnative  speakers. 

Vowel  duration  for  individual  Superior  speakers.   Two  of  the  three 
native  Spanish  speakers  were  the  most  highly  rated  of  the  nonnative 
speakers.   Speaker  #3  (the  most  highly  rated  nonnative  speaker), 
produced  a  mean  /I/  longer  than  his  mean  /i/  and  did  not  differentiate 
the  two  vowels  by  length  (see  Table  31).   Vowels  for  Speaker  #7  are 
both  outside  the  range  of  means  for  the  native  English  speakers. 
Speaker  #5  produced  the  vowels  with  equal  length.   These  findings 
suggest  that  vowel  duration  may  play  little  role  in  judgements  by 
native  speakers  of  English  on  strength  of  accent. 

Two  native  speakers  of  Arabic  were  included  in  the  Superior  group. 
Mean  overall  vowel  duration  was  21  ms  shorter  for  Speaker  #32  than  for 
the  native  English  speakers  while  Speaker  #25  produced  a  mean  vowel 
duration  13  ms  longer  than  the  native  English  speakers.   These  speak- 
ers' mean  vowel  duration  was  within  the  lower  limits  of  the  range  for 
native  English  speakers.   Speaker  #32  was,  however,  rated  higher  than 
#25  in  the  strength  of  accent  ratings.   Both  native  Arabic  speakers 
produced  an  /I-i/  ratio  close  to  that  of  the  native  English  speakers. 
Inspection  of  the  individual  data  for  the  native  Arabic  speakers 
showed  only  2  out  of  10  tokens  below  100  ms  in  length. 

Flege  (1979)  found  that  Arabic  short  vowels  centered  around  96  ms 
and  Arabic  long  vowels  averaged  176  ms.   Mean  vowel  duration  for 
Speaker  #32  occurred  about  midway  between  these  values  while  the  means 
for  Speaker  #25  was  closer  to  the  value  for  the  long  vowel.   Speaker 
#25  produced  some  of  the  longest  vowel  duration  tokens  in  the  Superior 
group  but  was  rated  the  least  native  of  the  Superior  speakers. 


-145- 

The  last  two  speakers  in  the  Superior  group  were  one  native 
speaker  of  Korean  and  one  native  speaker  of  Thai.  Overall  vowel 
duration  for  these  two  speakers  resembled  the  overall  vowel  duration 
for  the  native  English  speakers.   Values  for  the  individual  vowels 
fell  within  the  range  of  English  values  (see  Table  31). 

An  interesting  finding  is  that  none  of  the  speakers  differentiated 
HI   and  /i/  by  what  would  be  considered  long/short  vowel  parameters 
(/I/  about  half  the  length  of  /i/).   It  could  be  that  some  of  the  Thai 
speakers  in  the  present  study  were  the  only  speakers  who  confused 
phonemic  vowel  length  with  the  tense/lax  contrast  found  in  English. 
The  speakers  did  not  seem  to  class  /i/  with  the  long  vowels  and  /I/ 
with  the  short  vowels  where  their  native  languages  had  the  long/short 
phonemic  distinction.   Vowel  duration  related,  for  example,  to  conso- 
nant voicing,  may  be  a  more  important  factor  in  the  perception  of 
foreign  accent  than  vowel  duration  alone.   Relative  vowel  duration 
will  be  discussed  in  the  following  section. 

Vowel/Consonant  Ratio 

Findings  from  relative  duration  of  vowels  before  voiced  and  voice- 
less consonants  appear  to  confirm  the  limit  hypothesis  proposed  by 
Flege  and  Hillenbrand  (1984).  The  rated  groups  showed  a  significant 
difference  between  vowels  before  voiced  and  voiceless  consonants  but 
the  smaller  size  of  that  difference  for  nonnative  speakers  of  English 
was  seen  to  be  significantly  different,  at  least  for  the  Superior 
speakers.  The  effect  of  stop  voicing  on  vowels  is  much  greater  for 
the  native  English  speakers  than  for  the  Superior  group.   Vowels 
before  voiceless  consonants  are  more  nearly  equal  for  the  English  and 


-146- 

Superior  groups  than  are  vowels  before  voiced  consonants.   This  is  the 
shorter  duration  and  is  the  duration  which  would  have  been  likely  to 
correspond  for  native  and  nonnative  English  speakers,  if  any  durations 
did  match. 

The  previous  section  has  shown  that  all  the  groups  except  the  Very 
Good  group  were  differentiating  /i/  and  /I/  by  length.   Although  the 
Very  Good  group  does  not  differentiate  these  vowels,  they  do  show  a 
length  difference  for  vowels  related  to  the  voicing  of  the  following 
consonant. 

Influences  that  can  be  related  to  native  languaRe.   The  Arabic  and 
Thai  languages  do  not  exhibit  a  difference  in  vowel  duration  based  on 
the  voicing  of  the  following  consonant  (see  Table  8).  The  speakers  of 
Arabic  and  Thai,  in  this  study,  produced  the  smallest  mean  difference 
(32  ms  and  34  ms)  (see  Table  32).  It  is  surprising  that  the  native 
Arabic  speakers  produced  different  vowel  lengths  based  on  consonant 
voicing.  Other  published  studies  (Flege,  1981;  Mitleb,  1983;  Port  & 
Mitleb,  1983)  reported  an  insignificant  vowel  duration  difference 
related  to  the  voicing  of  the  following  consonant  for  native  Arabic 
speakers  in  English.   The  mean  difference  found  here  for  both  Arabic 
and  Thai  speakers,  however,  is  about  halfway  between  no  difference  and 
the  0.71  ratio  found  for  native  English  speakers.   This  would  imply 
that  the  Arabic  and  Thai  speakers  in  this  study  are  attempting  to 
produce  the  English  contrast. 

The  Korean  and  Spanish  languages  do  show  this  durational  differ- 
ence (see  Table  8).  The  ratio  produced  by  native  Koreans  in  English 
is  midway  between  the  Korean  ratio  and  the  English  ratio  (see  Table 
32). 


-147- 

The  native  Spanish  speakers  produced  a  ratio  of  0.67,  which  is 
much  closer  to  the  0.71  ratio  found  for  English  than  it  is  to  the  0.86 
ratio  specified  for  Spanish  (see  Tables  8  and  32).  Spanish  speakers, 
as  a  group,  do  not  differentiate  /i/  from  /I/  by  length,  although  they 
do  produce  different  durations  for  vowels  relative  to  the  following 
consonant  voicing. 

Vowel/consonant  ratio  for  individual  Superior  speakers.   The 
individual  Superior  speakers  more  closely  resemble  other  less  highly 
rated  speakers  than  they  do  each  other.   Superior  native  Spanish 
speakers  (#3,  #7,  and  #5)  seem  to  have  increased  the  vowel  duration 
contrast  based  on  voicing  from  the  0.86  contrast  specified  in  Spanish 
(see  Tables  8  and  34).  For  Speakers  #3  and  #7,  this  increase  was  to 
the  midway  point  predicted  by  Flege  and  Hillenbrand  (1984).   It  is 
interesting  that  Speaker  #5  seems  to  have  overdone  the  shortening  as 
this  speaker  did  with  overall  vowel  duration  as  well  as  the  duration 
of  the  final  stop  consonant. 

Arabic  does  not  have  the  final  contrast  under  investigation  in 
this  section  (see  Table  8) .   The  less  highly  rated  native  Arabic 
speaker  (#25)  seems  to  have  achieved  a  better  approximation  of  the 
English  norm  than  did  the  more  highly  rated  native  Arabic  speaker 
(#32)  (see  Table  32).   Speaker  #25  also  produced  longer  mean  vowels 
than  did  Speaker  #32. 

The  Korean  speaker  (#12)  reduced  the  contrast  in  Korean  to  about 
midway  between  the  norms  for  English  (.61)  and  Korean  (.78)  (see 
Tables  8  and  34) .  The  Thai  speaker  (#20)  achieved  a  contrast  not 
present  in  the  Thai  language,  although,  like  the  Arabic  speaker  (#32), 
the  Thai  speaker's  contrast  was  less  than  required  for  English  (see 


-148- 
Tables  8  and  34) .  The  Superior  native  Thai  speaker  may  have  applied 
the  phonemic  vowel  length  distinction  discussed  above.   However,  it 
should  be  noted  that  this  speaker  did  not  seem  to  apply  the  long/short 
vowel  contrast  to  differentiate  /I/  and  /i/  when  the  ratio  of  the  two 
vowels  is  examined.  The  native  Thai  speaker  produced  a  set  of  vowels 
and  a  ratio  which  matches  that  of  the  other  speakers  of  Thai.  The  same 
general  relationship  between  a  Superior  speaker  and  the  native  lan- 
guage was  also  found  for  the  other  nonnative  speakers  (see  Tables  32 
and  33) . 

Examination  of  the  vowel/consonant  ratio  and  related  data  for  the 
individual  members  of  the  Superior  group  indicates  that  except  for 
Speaker  #5,  all  of  the  Superior  speakers  produced  a  much  smaller 
difference  between  the  vowels  related  to  the  voicing  of  the  following 
consonant  than  was  produced  by  the  native  English  speakers  (see  Table 
34). 

Where  the  contrast  was  present  in  the  native  language,  for  indi- 
vidual speakers,  relative  duration  of  vowels  before  voiced  and  voice- 
less consonants  was  about  midway  between  the  values.   Where  the  con- 
trast did  not  exist,  speakers  seemed  to  approximate,  although  not 
closely,  the  English  contrast. 

The  results  related  to  vowel  duration  and  relative  vowel  duration 
seem  to  indicate  that  the  nonnative  speakers  in  this  study  were  pro- 
ducing durations  that  seemed  to  be  affected  by  the  duration  require- 
ments of  their  native  languages.   Not  all  of  these  native  language 
durations  are  known.   Some  trends  seen  here  suggest  that  the  differen- 
tial lengthening  for  the  long/short  vowel  contrast  was  applied  by  the 
native  Thai  speaker  to  the  stop  related  vowel  contrast  and  to  the 


-149- 
tense/lax  vowel  contrast  when  the  amount  of  the  contrasts  are 
examined.  Tense  vowels  averaged  37  ms  longer  than  lax  vowels  and 
vowels  before  voiced  consonants  averaged  34  ms  longer  than  vowels 
before  voiceless  consonants  for  the  entire  Thai  group.  The  individual 
Thai  speakers  were  consistent  in  this  pattern,  as  reflected  by  the 
small  standard  deviation  (see  Table  33). 

It  can  be  speculated  that,  for  Thai  speakers,  timing  that  is 
transferred  from  LI  to  L2  is  in  terms  of  amount  of  difference.   If 
this  is  the  case,  the  native  Thai  speakers  might  produce  a  tense/lax 
ratio  close  to  that  of  English  even  though  their  specific  vowel 
lengths  are  different  from  those  produced  by  native  English  speakers. 

Conclusions 

The  first  chapter  of  the  present  study  presented  the  need  for  the 
type  of  research  undertaken  here  as  being  stimulated,  at  least  in 
part,  by  problems  encountered  in  the  teaching  of  pronunciation  and  the 
general  feeling  among  those  who  teach  pronunciation  that  the  theoreti- 
cal foundations  for  teaching  are  weak.   A  new  model  has  been  proposed 
that  seems  to  explain  more  data  related  to  the  acquisition  of  sounds 
than  did  earlier  models.   A  test  of  that  model,  in  this  study,  upheld 
the  general  tenets  of  the  new  model.  Changes  are  made  in  subsegmental 
parameters  of  speech  production  by  L2  speakers. 

With  regard  to  the  specific  claim  being  tested  in  the  present 
study,  the  findings  indicate  that  the  Superior  speakers  were  authentic 
(that  is,  there  was  no  statistically  significant  difference  between 
the  native  English  speakers  and  the  nonnative  Superior  speakers)  for 


-150- 

eight  of  the  nine  parameter  measured.  However,  the  results  must  be 
interpreted  cautiously  due  to  limitations  in  the  design  of  the  study. 
Trends  seen  here  indicate  that  superior  bilinguals  are  different  from 
less  superior  bilinguals.  The  best  individual  speakers'  productions 
match  the  native  English  speakers  for  at  least  half  of  the  parameters 
measured.   But,  even  the  most  highly  rated  speakers,  who  sound  as  if 
they  were  native  born  English  speakers,  produced  sounds  that  do  not 
match  the  phonetic  norms  for  English  in  all  cases.  Superior  bilinguals 
move  toward  L2  phonetic  norms  from  directions  dictated  by  the  phonetic 
norms  of  their  LI .   Examination  of  individual  data  indicates  that 
superior  speakers  more  closely  resemble  less  highly  rated  speakers  of 
their  LI  than  they  resemble  each  other. 

The  superior  speakers  do  seem  to  have  mastered  the  phonemes  and 
allophonic  variations  used  in  English  (for  example,  the  release 
options  for  final  stops).   In  order  to  do  so,  they  may  have  re-labeled 
VOT  categories  to  serve  in  English,  which  is  a  possibility  not  pre- 
viously suggested  in  the  literature. 

If  this  re-labeling  hypothesis  is  supported  by  further  research, 
it  implies  teaching  strategies  based  on  a  comparison  of  LI  and  L2  VOT 
categories.  When  the  LI  has  a  VOT  category  which  can  serve  in  the  L2 , 
the  language  learner  can  be  taught  to  use  that  category.   For  example, 
native  Spanish  speakers  can  be  taught  to  pronounce  English  /b/  words 
with  Spanish  /p/.   In  addition,  the  flapped  /t/  and  /d/  phone  is  an 
allophone  of  /r/  in  many  languages  and  can  be  re-labeled  for  use  in 
English. 

It  is  possible  that  excellent  bilinguals  might  sound  to  listeners 
as  if  they  speak  a  slightly  different  dialect  of  English  from  the 


-151- 

listener's,  but  a  dialect  that  is  acceptably  English.  In  other  words, 
productions  by  these  speakers  may  seem  to  fall  within  a  still  undeter- 
mined range  of  acceptable  variation  for  dialects  of  American  English. 

Data  are  not  available  on  the  range  of  acceptable  variation  in 
dialects  of  American  English  in  natural  speech.   This  is  true,  in 
part,  because  phonological  studies  of  English  focus  on  more  abstract 
units  than  are  of  concern  here. 

This  is  also  true,  in  part,  because  traditional  phonetic  research 
tends  to  use  citation  forms  or  very  careful  speech.   Researchers 
concerned  with  more  natural  speech  are  generally  more  interested  in 
the  compression  effects  of  various  factors  related  to  speed  than  in 
range  of  variation.   But  speech  in  careful  form  is  a  distortion  of 
natural  speech.  Words  seldom  occur  in  citation  form.   If  the  norms 
that  nonnative  speakers  must  acquire  were  citation  forms,  then  the 
best  speakers  would  be  those  nonnative  speakers  closer  in  time  to  the 
classroom  where  citation  forms  are  most  often  heard  by  second  language 
learners.  It  seems  more  likely  that  the  best  speakers  will  choose 
targets  that  are  related  to  the  norms  of  conversational  speech.  Very 
little  information  is  available  concerning  the  phonetic  norms  of 
conversational  speech  in  English.   Some  directions  for  further 
research  in  this  area  have  been  offered  here.   The  most  valuable 
foundations  for  teaching  pronunciation  will  be  obtained  from  theoreti- 
cal models  that  take  into  account  the  rules  of  ordinary  conversation. 


APPENDIX  A 
INSTRUCTIONS  AND  RATING  SCALE 

Instructions 


You  will  be  listening  to  a  series  of  five  tapes.   The  tapes 
contain  samples  of  speech  from  Americans  and  foreigners  who  speak 
English  well.  Each  speaker  will  say  the  phrase:  "His  friends  say  he 
is  looking  for  the  pot  of  gold  at  the  end  of  the  rainbow." 

Your  task  is  to  rank  the  speakers  on  a  3  point  scale  according  to 
how  much  foreign  accent  they  seem  to  have.   Do  not  pay  attention  to 
the  quality  of  their  voices  but  only  to  the  amount  of  accent  each 
speaker  seems  to  have.   Many  of  the  speakers  are  American  and  have  no 
accent.  You  would  circle  1  for  these  speakers.   If  the  speaker  seems 
to  have  a  light  accent  but  one  that  is  definitely  there,  circle  the  2. 
If  the  speaker  has  a  medium  or  heavy  accent,  circle  the  3.   Are  there 
any  questions  about  the  ranking? 

The  first  tape  you  will  hear  has  15  speakers,  the  other  3  tapes 
have  8  speakers  each.   Please  write  down  the  tape  number  when  it  is 
given  to  you.   Are  there  any  questions? 


-152- 


-153- 

Sample  Rating  Sheet 


Your  name 


Tape  Number 


You  will  hear  a  tape  of  8  speakers.  Please  rank  the  speakers  by 
how  much  FOREIGN  ACCENT  they  seem  to  have.  Circle  the  number  of  your 
choice. 

1  =  no  accent,  probably  American 

2  =  light,  but  detectable  foreign  accent 

3  =  medium  to  heavy  foreign  accent 
Circle  your  choice 


Speaker  number 

1 

2 

3 

4 

5 

6 

7 

8 


2 
2 
2 
2 
2 
2 
2 
2 


3 

3 
3 
3 
3 

3 
3 

3 


APPENDIX  B 
INSTRUCTIONS  FOR  EXPERIMENTAL  PROCEDURE 


I  would  like  you  to  read  some  words  for  me.   They  are  English 
words  but  some  of  them  are  names  so  you  may  not  know  how  to  pronounce 
them.   Try  the  words  on  this  list  so  I  can  help  you  with  the  ones  you 
don't  know.   Some  of  them  rhyme  like  beat-Pete.   Try  the  words. 

You  will  be  pronouncing  the  words  in  a  carrier  sentence.   The 

sentence  is  "I  say  again  today."  If  the  word  is  beat,  you 

say,  "I  say  beat  again  today."  Say  the  words  on  the  list  in  the 
carrier  sentence. 

I  am  going  to  record  you  saying  the  words  in  the  carrier  sentence 
now.   Take  your  time  and  try  to  pronounce  them  as  well  as  possible. 


This  paragraph  is  called  the  Rainbow  Passage.  Please  read  it 
through  aloud  for  me.  Are  there  any  words  you  had  a  problem  with?  I 
am  going  to  record  you  reading  the  Rainbow  Passage  now. 


-154- 


APPENDIX  C 
LANGUAGE  BACKGROUND  QUESTIONNAIRE 


1. 

2. 

4. 

5. 

7. 

8. 

9. 

10, 

11. 

12. 

14. 

16. 

17. 


Name 
Age  _ 


3.   Years  in  U.S. 


Occupation 


Country  of  birth 


6.  Years  there 


Other  countries  lived 
Father's  birthplace 


Mother°s  birthplace_ 

First  language 

Other  child  L. 


Years  of  English  study_ 


13 .  Started 


Native  English  instructors 

Languages  of  residence  

Married 


15.   When 


19. 

Self  rating  of  being  understood 
a.  poor         b.  fair 

20. 

a. 

English  language  use  now.  % 
home 

b. 

work 

c. 

friends 

d. 

family 

e . 

reading 

18.  Spouse's  Native  Language_ 


c.  excellent 


Native  Language  use 


-155- 


REFERENCES 

Abramson,  A.,  1962.  The  Vowels  and  Tones  of  Standard  Thai;  Acoustical 
Measurements  and  Experiments.   Bloomington:  Indiana  University 
Research  Center  in  Anthropology,  Folklore,  and  Linguistics. 

Allen,  G.,  1978.  Vowel  duration  measurement:  A  reliability  study. 
J.  Acous.  Soc .  Am.  63,  1176-1185. 

Asher,  J.,  &  Garcia,  R.,  1969.  The  optimal  age  to  learn  a  foreign 
language.  Modern  Language  Journal  53,  334-341. 

Barry,  W. ,  1974.   Language  background  and  the  perception  of  foreign 
accent.  J.  of  Phon.  2,  65-89. 

Briere,  E.,  1966.   An  investigation  of  phonological  interference. 
Language  42,  769-796. 

Buck,  J.,  &  Alterbaum,  I.,  1983.  Listen  Speak.   Dubuque: 
Kendall/Hunt. 

Bush,  C,  1967.  Some  acoustic  parameters  of  speech  and  their 

relationship  to  the  perception  of  dialect  differences.  TESOL 
Quarterly  1,  20-30. 

Caramazza,  A.,  Yeni-Komshian ,  G.,  Zurif,  E.,  &  Carbone ,  E.,  1973.   The 
acquisition  of  a  new  phonological  contrast:  The  case  of  stop 
consonants  in  French-English  bilinguals.   J.  Acous.  Soc.  Am.  54, 
421-428. 

Cefola,  P.,  1981.   A  Study  of  Interference  of  English  in  the  Language 
of  Thai  Bilinguals  in  the  United  States.   Ph.D.  dissertaion, 
Georgetown  University,  Washington,  D.C. 

Chen,  M.  ,  1970.   Vowel  length  as  a  function  of  the  voicing  of  the 
consonant  envioronment .   Phonetica  22,  129-159. 

Chomsky,  N.,  &  Halle,  E.,  1968.   The  Sound  Pattern  of  English.  New 
York:  Harper  &  Row. 

Delattre,  P.,  1962.   Some  factors  of  vowel  duration  and  their 

cross-linguistic  validity.   J.  Acous.  Soc.  Am.  34,  1141-1143. 

Denes,  P.,  1955.   Effect  of  duration  in  the  perception  of  voicing.  J. 
Acous.  Soc.  Am.  25,  761-764. 


-156- 


-157- 

Fairbanks,  G.,  1960.  Voice  and  Articulation  Drillbook.  New  York: 
Harper  &  Row. 

Flege,  J.,  1979.   Phonetic  Interference  in  Second  Language 
Acquisition.   Ph.D.  dissertation,  Indiana  University, 
Bloomington,  Indiana. 

Flege,  J.,  1980.   Phonetic  approximation  in  second  language 
acquisition.   Language  Learning  30,  117-134. 

Flege,  J.,  1981.  The  phonological  basis  of  foreign  accent:  A 
hypothesis.  TESOL  Quarterly  15,  443-452. 

Flege,  J.,  1982.   Laryngeal  timing  and  phonation  onset  in  utterance- 
initial  English  stops.  J.  of  Phon.  10,  177-192. 

Flege,  J.,  1984.  The  detection  of  French  accent  by  American 
listeners.   J.  Acous.  Soc.  Am.  76,  692-707. 

Flege,  J.,  1986.   The  production  and  perception  of  foreign  language 

speech  sounds.  In  H.  Winitz  (ed.),  Human  Communication  and  Its 
Disorders,  Vol.1.   Norwood,  New  Jersey:  Ablex. 

Flege,  J.,  1987.   The  production  of  "new"  and  "similar"  phonemes  in  a 
foreign  language:  Evidence  for  the  effect  of  equivalence 
classification.  J.  of  Phon.  15,  47-65. 

Flege,  J.,  &  Brown,  W.,  1982.  The  voicing  contrast  between  English 
/p/  and  /b/  as  a  function  of  stress  and  position-in-utterance. 
J.  of  Phon.  10,  335-345. 

Flege,  J.,  &  Eefting,  W.,  1987.  Production  and  perception  of  English 
stops  by  native  Spanish  speakers.  J.  of  Phon.  15,  67-83. 

Flege,  J.,  &  Hillenbrand,  J.,  1984.   Limits  on  pronunciation  accuracy 
in  adult  foreign  language  speech  production.   J.  Acous.  Soc.  Am. 
76,  708-721. 

Flege,  J.,  &  Port,  R.,  1981.  Cross-language  phonetic  interference: 
Arabic  to  English.  Language  and  Speech  24,  125-146. 

Fourakis,  M.  ,  &  Iverson,  G.,  1985.   On  the  acquisition  of  second 
language  timing  patterns.   Language  Learning  35,  431-442. 

Gandour,  J.,  1974.   Consonant  types  and  tone  in  Siamese.   J.  of  Phon. 
2,  337-350. 

Gilbert,  J.,  1984.   Clear  Speech.  Cambridge:  Cambridge  Univ.  Press. 

Han,  M.S.,  1964.  Studies  in  the  Phonology  of  Asian  Languages.  2: 

Duration  of  Korean  Vowels.   Los  Angeles:  University  of  Southern 
California. 


-158- 

House,  A.,  1961.  On  vowel  duration  in  English.   J.  Acous.  Soc.  Am. 
33,  1174-1178. 

Hutchinson,  S. ,  1973.  The  learning  of  English  suprasegmental  rules 
for  stress  and  final  syllables  by  Spanish  speakers.   Paper 
presented  at  the  Mid-American  Linguistics  Conference. 
(University  of  Iowa) 

Jung,  M-W.,  1962.   A  Contrastive  Study  of  English  and  Korean  Segmental 
Phonemes  with  Some  Suggestions  Toward  Pedagogical  Applications. 
M.S.  thesis,  Georgetown  University,  Washington,  D.C. 

Jonasson,  J.,  &  McAllister,  R. ,  1972.   Foreign  accent  and  timing:  An 
instumental  phonetic  study.   Papers  from  the  Institute  of 
Linguistics,  University  of  Stockholm,  14,  11-40. 

Keating,  P.,  1979.   A  Phonetic  Study  of  a  Voicing  Contrast  in  Polish. 
Ph.D.  dissertation,  Brown  University. 

Keating,  P.,  1985.  Universal  phonetics  and  the  organization  of 

grammars.   In  V.  Fromkin  (ed.),  Phonetic  Linguistics.  Orlando: 
Academic  Press. 

Keating,  P.,  Linker,  W. ,  &  Huffman,  M. ,  1983.   Patterns  of  allophone 
distribution  for  voiced  and  voiceless  stops.   J.  of  Phon.  11, 
277-301. 

Kent,  R.,  1983.   The  segmental  organization  of  speech.   In  P. 
MacNeilage  (ed.),  The  Production  of  Speech.   New  York: 
Springer-Verlag . 

Klatt,  D.,  1976.   Linguistic  uses  of  segmental  duration  in  English: 
Accoustic  and  perceptual  evidence.  J.  Acous.  Soc.  Am.  59, 
1208-1221. 

Kruatrachue,  F.,  1960.   Thai  and  English:  A  Comparative  Study  of 

Phonology  for  Pedagogical  Applications.   Ph.D.  dissertation, 
Indiana  University. 

Labov,  W. ,  1966.   The  Social  Stratification  of  English  in  New  York 
City.   Washington,  D.C:  Center  for  Applied  Linguistics. 

Labov,  W. ,  1972.   Sociolinguistic  Patterns.   Philadelphia:  University 
of  Pennsylvania  Press. 

Labov,  W.  ,  Yaeger,  M.,  &  Steiner ,  R.,  1972.  A  Quantitative  Study  of 
Sound  Change  in  Progress,  Vol.  1.  Philadelphia:  U.S.  Regional 
Survey. 

Leather,  J.,  1983.   Second-language  pronunciation  learning  and 
teaching.   Language  Teaching  16,  198-219. 

Lehiste,  I.,  1970.   Suprasegmentals.   Cambridge:  M.I.T.  Press. 


-159- 

Liberman,  M.Y.,  1983.   In  favor  of  some  uncommon  approaches  to  the 
study  of  speech.   In  P.  MacNeilage  (ed.),  The  Production  of 
Speech.   New  York:  Springer- Verlag. 

Linananda ,  R . ,  1964 .   A  Contrastive  Study  of  English  and  Thai 

Segmental  Phonemes,  the  Problems  for  Thai  Speakers  Learning 
English  and  Sample  Drills  of  Pronunciation  Problems.  M.S. 
thesis,  Georgetown  University,  Washington,  D.C. 

Lisker,  L. ,  1957.  Closure  duration  and  the  intervocalic  voiced- 
voiceless  distinction  in  English.   Language  33,  42-49. 

Lisker,  L. ,  &  Abramson,  A.,  1964.   A  cross-language  study  of  voicing 
on  initial  stops:  acoustical  measurements.  Word  20,  384-422. 

Lisker,  L.  ,  &  Abramson,  A.,  1967.  Some  effects  of  context  on  voice 
onset  time  in  English  stops.   Language  and  Speech  10,  1-28. 

Luce,  P.  &  Charles-Luce,  J.,  1985.   Contextual  effects  on  vowel 

duration,  closure  duration,  and  the  consonant/vowel  ratio  in 
speech  production.  J.  Acous .  Soc .  Am.  78,  1949-1957. 

Mack,  M.  ,  1982.   Voicing-dependent  vowel  duration  in  English  and 
French:  monolingual  and  bilingual  productions.  J.  Acous. 
Soc. Am.  71,  173-178. 

Mitleb,  F.,  1981.   Segmental  and  Nonsegmental  Structure  in  Phonetics, 
Evidence  from  Foreign  Accent.   Ph.D.  dissertation,  Indiana 
University,  Bloomington,  Indiana. 

Mitleb,  F. ,  1984a.  Vowel  Length  contrast  in  Arabic  and  English:  a 
spectrographic  test.  J.  of  Phon.  12,  229-235. 

Mitleb,  F.,  1984b.   Voicing  effect  on  vowel  duration  is  not  an 
absolute  universal.  J.  of  Phon.  12,  23-27. 

Mitleb,  F. ,  1985.   Intelligibility  of  "voicing"  produced  by  Arabs.  J. 
of  Phon.  13,  117-122. 

Moulton,  W. ,  1962.  The  Speech  Sounds  of  English  and  German.  Chicago: 
University  of  Chicago  Press. 

Oiler,  D.K.,  1973.  The  effect  of  position  in  utterance  on  speech 

segment  duration  in  English.   J.  Acous.  Soc.  Am.  54,  1235-1247. 

Peterson,  G.E. ,  &  Lehiste,  I.,  1960.  Duration  of  syllable  nuclei  in 
English.  J.  Acous.  Soc.  Am.  32,  693-703. 

Port,  R.,  1981.   Lingusitic  timing  factors  in  combination.   J.  Acous. 
Soc.  Am.  69,  262-274. 

Port,  R.,  Al-Ani,  S.  &  Maida,  S. ,  1980.   Temporal  compensation  and 
universal  phonetics.   Phonetica  37,  235-252. 


-160- 

Port,  R.,  &  Mitleb,  F.,  1983.  Segmental  features  and  implementation 
in  acquisition  of  English  by  Arabic  speakers.  J.  of  Phon.  11, 
219-229. 

Port,  R.,  &  Rotunno,  R.,  1979.  Relation  between  voice-onset  time  and 
vowel  duration.  J.  Acous .  Soc.  Am.  66,  654-662. 

Prator,  C. ,  &  Robinett ,  B. ,  1985.  Manual  of  American  English 
Pronunciation .   New  York:  Holt,  Rinehart ,  &  Winston. 

Schmidt,  A.,  &  Kass,  E.,  1986.   American  English  Pronunciation 
Workbook.   Dubuque:  Kendall/Hunt. 

Sharf,  D.,  1962.  Duration  of  post-stress  intervocalic  stops  and 
preceeding  vowels.  Language  and  Speech  5,  26-31. 

Umeda,  N.,  1975.   Vowel  duration  in  American  English.  J.  Acous.  Soc. 
Am.  58,  434-445. 

Umeda,  N.,  1977.  Consonant  duration  in  American  English.  J.  Acous. 
Soc.  Am.  61 ,  846-858 

Weinreich,  U.,  1953.  Languages  in  Contact.  The  Hague:  Mouton. 

Willams,  L. ,  1980.   Phonetic  variation  as  a  function  of  second 

language  learning.   In  G.  Yeni-Komshian,  J.  Kavanagh ,  &  C. 
Ferguson  (eds.),  Child  Phonology,  Vol. 2 Perception.   New  York: 
Academic  Press. 

White,  0.,  1979.   General  American  Speech  for  the  Bilingual  Spanish 
Speaking  Student.   Dubuque:  Kendall/Hunt. 

Winer,  B.,  1971.  Statistical  Principles  in  Experimental  Design.  New 
York:  McGraw  Hill. 

Yeni-Komshian,  G.,  Caramazza,  A.,  &  Preston,  M. ,  1977.   A  study  of 
voicing  in  Lebanese  Arabic.  J.  of  Phon.  5,  35-48. 

Zimmerman,  S.A.,  &  Sapon,  S.M. ,  1958.   Note  on  vowel  duration 
cross-linguistically.   J.  Acous.  Soc.  Am.  30,  152-153. 

Zue,  V.,  &  Laferriere,  M.,  1979.  Acoustic  study  of  medial  /t,  d/  in 
American  English.   J.  Acous.  Soc.  Am.  66,  1039-1050. 


BIOGRAPHICAL  SKETCH 

Anna  Marie  Schmidt  graduated  from  Eckerd  College  (Florida  Pres- 
byterian College)  in  St.  Petersburg,  Florida.   She  obtained  an  M.A.  in 
sociolinguistics  from  the  University  of  Pennsylvania. 


-161- 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is 
fully  adequate,  in  scope  and  quality,  as  a  dissertation  for  the 
degree  of  Doctor  of  Philosophy. 


Howard  B.  Rothman,  Chairman 
Associate  Professor  of  Speech 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is 
fully  adequate,  in  scope  and  quality,  as  a  dissertation  for  the 
degree  of  Doctor  of  Philosophy. 


William  S.  Brown ,  ^Lr_r~ 
Professor  of  Speech 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it 
conforms  to  acceptable  standards  of  scholarly  presentation  and  is 
fully  adequate,  in  scope  and  quality,  as  a  dissertation  for  the 
degree  of  Doctor  of  Philosophy. 


2k^ 


an  Casagrande 
rofessor  of  Lin 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the 
Department  of  Speech  in  the  College  of  Liberal  Arts  and  Sciences 
and  to  the  Graduate  School  and  was  accepted  as  partial  fulfillment 
of  the  requirements  for  the  degree  of  Doctor  of  Philosophy. 


April  1988 


Dean,  Graduate  School 


UNIVERSITY  OF  FLORIDA 

3  1262  08557  0215 


